笔者基于条件随机场(Conditional Random Field,CRF)和支持向量机(Support Vector Machine,SVM)模型,通过增加上下文、附加成分和蒙古文字母等特征,对蒙古文句长为8到25的将近4万个句子进行了词性标注的研究。研究表明,CRF模型和SVM模型在标注蒙古文词性方面都有比较好的结果,其中在考虑上下文和特征"连写的构形附加成分"的情况下,SVM模型标注蒙古文词性的准确率可以达到99%以上。
In the tagging of 400,000 sentences of 8 to 25 words with the new Mongolian POS tagging set, Conditional Random Field (CRF) and Support Vector Machine (SVM) models are applied in analyzing the features such as the contexts, ag- glutinative inflectional suffix, and the letter distribution patterns of Mongolian words. Both models report satisfactory outcome, and the SVM with contextual features and the "agglutinative inflectional suffix" in particular reports a precision rate of over 99%.