为提高文本分类的准确性与效率,提出一种基于潜在语义分析和改进的超球支持向量机的文本分类模型。该模型利用潜在语义分析进行特征抽取,消除同义词和多义词在文本表示时所造成的偏差,实现文本向量的降维。针对超球重叠区域的文本分类问题,设计一种新的决策方法一基于密集度的决策策略。实验结果表明,该模型在类别数目较小时具有较好的分类效果,改进的算法有效可行。
A text classification model, which is based on Latent Semantic Analysis and Improved of Hyper-sphere Support Vector Machine, is proposed in order to improve the accuracy and efficiency of text classification. Using the latent semantic analysis for feature extraction in this model, the affect of synonymy and polysemy in text representation process is eliminated and the dimension of text vector is reduced. A new approach to decision making, which is based on the intensity, is designed for the text classification of ultra-overlapping regions in the ball. Experimental results show that the model will give a good classification results when the number of the classes is small. The improved algorithm is effective and feasible.