对文本分类中降维技术、提高分类精度和效率的方法进行了研究,提出了一种基于矩阵投影运算的新型文本分类算法——Matrix Projection(MP)分类算法。矩阵运算将训练样例中表示文本特征的三维空间投影到二维空间上,得到归一化向量,有效地达到了降维与精确计算特征项权重的目的。与其他多种文本分类算法对比实验表明,MP算法的分类精度和时间性能都有明显提高,在两套数据集上的宏平均F1值分别达到92.29%和96.03%。
A new algorithm,namely matrix projection algorithmi,s proposed for text classification to solve the key problems of reducing dimension of features and improving efficiency and accuracy.It is based on matrix operation,which projects three-dimensional feature space of training samples onto two-dimensional feature space and obtains a normalized feature vec-tor,achieves the aims of reduction in feature dimensions and accurate computation of feature term weights.Comparing with several typical algorithmst,he proposed algorithm is remarkably superior to them in terms of accuracy and time,and the F1 value reaches 92.29% and 96.03% respectively on two typical data sets.