随着Internet上维吾尔文信息的迅速发展,维吾尔文文本分类成为处理和组织这些大量文本数据的关键技术。研究维吾尔文文本分类相关技术和方法,针对维吾尔文文本在向量空间模型(VSM)表示下的高维性,采用词干提取和IG相结合的方法对表示空间进行降维。采用基于机器学习的分类算法(kNN和Nave Bayes)对维吾尔文文本语料进行了分类实验并分析了实验结果。
With the rapid increase of Uyghur language text information on the Internet,Uyghur language text categorization has become a key technique for processing and organizing these text data.As to the high dimensionality of Uyghur language texts under vector space model representation,the stemming technique is used along with IG to reduce the dimensionality.The categorization experiments are performed using machine learning based text categorization algorithms such as Na?ve Bayes and kNN on Uyghur language text corpus and the experimental results are analyzed.