为了提高文本分类的准确性和效率,提出了一种基于潜在语义分析和超球支持向量机的文本分类模型。针对SVM对大规模文本分类时收敛速度较慢这一缺点,本文将超球支持向量机应用于文本分类,采用基于增量学习的超球支持向量机分类学习算法进行训练和分类。实验结果表明,超球支持向量机是一种解决SVM问题的有效方法,在文本分类应用中具有与SVM相当的精度,但是明显降低了模型复杂度和训练时间。
A text categorization model based on Latent Semantic Analysis and Hyper-sphere Support Vector Machine (HS-SVM) is proposed to improve the accuracy and efficiency of text categorization. As the convergence rate of using SVM to categorize the large-scale text is relatively slow,the Hyper-sphere Support Vector Machine is applied to text categorization and the Hyper-sphere Support Vector Machine Classification Learning Algorithm based on incremental learning is applied to training and categorization. Experiments show that the Hyper-sphere Support Vector Machine is an efficient solution to the SVM problem,and has the same accuracy as the SVM in the text categorization applications,but significantly reduces the complexity of the model and the training time.