本文针对文本分类的文本表示、分类器等关键技术进行了研究,并且使用基于K最近邻(k-NearestNeighbor,KNN)分类算法在系统上实现了文本分类器。在此基础上通过实验数据针对样本集、K的取值等因素对分类效果的影响做了详细的研究比较,通过对性能变化原因分析,提出了最优性能解决方案。
The paper is researched on the key technology of classifier and text representation for classification, and adapted classified algorithm of KNN (k-Nearest Neighbor)to realize the text classifieron the system.compared the classify influence based on the factor of sample set and the value of k through test data.After analyzing the reason of performance change,put forward the best relative performance solution plan.