数据的变革带来了前所未有的发展,对丰富且复杂的结构化、半结构化或者是非结构化数据的监测、分析、采集、存储以及应用,已经成为了数据信息时代发展的主流,分类和处理海量数据包含的信息,需要有更好的解决方法。传统的数据挖掘分类方式显然已经不能满足需求,面对这些问题,这里对数据挖掘的一些分类算法进行分析和改进,对算法进行结合,提出了改进的SVM_KNN分类算法。在这个基础上,利用Hadoop云计算平台,将研究后的分类算法在MapReduce模型中进行并行化应用,使改进后的算法能够适用于大数据的处理。最后用数据集对算法进行实验验证,通过对比传统的SVM分类算法,结果表明改进后的算法达到了高效、快速、准确、低成本的要求,可以有效地进行大数据分类工作。
The reform of data has brought the unprecedented development, to monitor, analyze, collect, store and apply to the rich and complex structured,semi-structured or unstructured data has become the mainstream of the development of the information age. To classify and deal with the information contained in mass data,it' s needed to have a better solution. The traditional data mining classification method cannot meet the demand any longer. To face these problems, it analyzes and improves the classification algorithm in data mining in this paper. Combined with the algorithms,an improved SVM_KNN classification algorithm is proposed. Then on this basis,by utilizing Hadoop cloud computing platform,the new classification algorithm is put into MapReduce model for parailelization application, so the improved algorithm can be applied to large data processing. Finally, data set is used to conduct experimental verification on the algorithm. By comparing with traditional SVM classification algorithm,the results show that the improved algorithm has become more efficient,fast, accurate and cost-effective,which can effectively carry out large data classification.