藏文词性标注是藏文信息处理中非常重要的基础性问题,该文以最大熵模型为基本框架,根据藏文的构词特征及统计分析结果,定义并选取特征模板,研究了融合语言特征的最大熵藏文词性标注模型。实验结果表明,最大熵模型能够较好的处理藏文词性标注问题,音节特征可以显著提高藏文词性标注的效果,与基准系统相比使错误率降低了6.4%。
Tibetan Part of Speech(POS)is an important problem for Tibetannatural language processing,the paper studies the fusion of morphologicalfeatures for Tibetan part of speech withmaximum entropy model,based on the analysis of Tibetan scripts and the result of statistics,and define the feature templates.Experimental results show that,Tibetan POS with maximum entropy achieves much better results,syllable features can increase the performance of Tibetan POS significantly,and obtain an error reduction of 6.4%compare to the baseline.