东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

用KPCA-SVM的方法检测垃圾标签的研究

ISSN号：1673-629X
期刊名称：《计算机技术与发展》
时间：0
分类：TP301[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]广西大学计算机与电子信息学院,广西南宁530004
相关基金：基金项目：教育部人文社会科学研究项目（11YJAZH080）

关键词：数据降维, 核主成分分析法, 支持向量机, 垃圾标签, data dimension reduction, kernel principal component analysis theory, support vector machine, social spam

中文摘要：

高维数据中进行各种处理时所需样本数量会成指数级增加，同时样本间距离的价值也逐渐减小，将导致维数灾问题。文本标签数据通常会面临数据维数过高的问题，会影响用户对垃圾标签的检测。文中借助支持向量机的数学模型构建出针对Folksonomy的大规模垃圾标签检测模型。为了减少检测垃圾标签时维数过高的影响，在核主成分分析理论的启发下，将数据降维思想引入数据约简领域，提出基于核主成分分析法的大规模SVM数据集约简模型。最终实例化形成一种新的垃圾标签检测方法，即基于核主成分分析支持向量机（ KPCA-SVM）的大规模垃圾标签检测模型。该模型在垃圾标签检测中可以在不影响数据特征的前提下，缩短模型的测试时间且检测性能良好。

英文摘要：

The needed sample will increase exponentially when processing high-dimensional data,the value of the distance between the sample also gradually reduced at the same time,which will lead to the dimension disaster problem. Text label data usually face this prob-lem of high-dimensional data,it will affect the users to detect social spam. In this paper,take advantage of the mathematical model of Support Vector Machine （ SVM） to construct the large-scale social spam detection model for Foklsonomy. In order to reduce the influ-ence of high-dimensional data,inspired by the kernel principal component analysis theory,the ideas of data dimension reduction are intro-duced,the large-scale SVM data set reduction model is proposed which is based on kernel principal component analysis. Finally form a new social spam detection method,the large-scale social spam detection model based on kernel principal component analysis and support vector machine. This model would not affect the characteristics in the social spam detection,and it will shorten the test time and have a good detection performance.

同期刊论文项目