维吾尔语是形态变化复杂的黏着性语言,维吾尔语词干词缀切分对维吾尔语信息处理具有非常重要的意义,但到目前为止,维吾尔语词干提取的性能仍存在较大的改进空间。该文以N-gram模型为基本框架,根据维吾尔语的构词约束条件,提出了融合词性特征和上下文词干信息的维吾尔语词干提取模型。实验结果表明,词性特征和上下文词干信息可以显著提高维吾尔语词干提取的准确率,与基准系统比较,融入了词性特征和上下文词干信息的实验准确率分别达到了95.19%和96.60%。
Uyghur is an agglutinative language with complex morphology, Uyghur words stem segmentation plays an important role in Uyghur language information processing. But so far, the performance of the Uyghur words stem segmentation still has much room for improvement . According to the constraints of Uyghur word formation, we proposed a stem segmentation model for Uyghur which fuses the part of speech feature and context information based on N--gram model. Experimental results show that, the part of speech feature and the context information of stem can increase the performance of Uyghur words stem segmentation significantly with the accuracy reaching 95. 19% and 96.60% respectively compared to the baseline system.