东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于SVM-修正KNN 算法的哈萨克语文本分类

ISSN号：1001-988X
期刊名称：《西北师范大学学报：自然科学版》
时间：0
分类：TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1] 伊犁师范学院电子与信息工程学院,新疆伊宁 835000, [2] 东北师范大学计算机与信息科学学院,吉林长春130117
相关基金：国家自然科学基金资助项目（61363066）;教育部博士点基金资助项目（20110043110011）;吉林省科技发展计划项目（20120302）;伊犁师范学院院级项目（2012YB017）

关键词：词干提取, DFR, VSM, SVM—KNN, stemming, DFR, VSM, SVM-KNN

中文摘要：

为了实现哈萨克语文本分类，根据哈萨克语语法规则，给出了哈萨克语文本词干的提取方法；结合DFR特征选择方法和VSM文本表示模型实现哈萨克语文本的预处理，提出了一种SVM和修正KNN协同的文本分类算法，分别在自行构建的语料集和整理的《新疆日报》哈萨克语数据集上进行大量文本分类仿真实验。结果表明，该方法在哈萨克语文本分类上具有良好的分类性能，并比SVM ，KNN的测试性能优越。

英文摘要：

In order to get the Kazakh language text classification , according to the Kazakh language features , this paper presents the Kazakh stem extract principle , and implementes the Kazakh text preprocessing combined with DFR feature selection and VSM model . This paper proposes a SVM-modified KNN algorithm ,a large number of text categorization experiments are simulated on the own building data sets and the Xinjiang Daily Kazakh data sets respectively . The numerical experiment results show that the method in the Kazakh language text classification has a good classification performance , and its test performance is better than the SVM and KNN .

同期刊论文项目