随着互联网整体水平的提高,大量基于维吾尔文的网络信息不断建立,引起了对不同领域的信息进行情感倾向性分析的迫切需要。该文考虑到维吾尔文没有足够的情感训练语料和完整的情感词典,结合机器学习方法和词典方法的优点,构建一个分类器模型LCUSCM(Lexicon-based and Corpus-based Uyghur Text Sentiment Classification Model),先用自己构建的维吾尔文情感词典对语料进行高质量的情感分类,分类过程中对词典进行递归扩充,再根据每条句子的情感得分,从词典分类的结果中选择一部分语料来训练一个分类器并改进第一步的分类结果。此方法的正确率比单独使用机器学习方法提高了9.13%,比词典方法提高了1.82%。
With the development of the Internet, a large number of online Uyghur texts appeared, which demands sentiment analysis for different applications. Considering there are not neither enough training data nor a complete sentiment lexicon for Uyghur sentiment analysis, this paper combines the Lexicon-based method with Corpus-based method, proposing a so-called LCUSCM (Lexicon-based and Corpus-based Uyghur Text Sentiment Classification Model). It first classifies the text by using a manual-built Uyghur sentiment dictionary, with the lexicon is enriched incrementally in this process. Then, the reliable classified sentences are selected to train a classifier so as to refine the results of the first step. The accuracy of the hybrid method increased 9.13% than using machine learning meth- od, and 1.82% than the lexicon based method.