SVM算法和朴素贝叶斯分类算法是对大量复杂数据分类中性能优秀的算法。然而它们的缺点使得分类效果受到了影响,而且传统的数据挖掘分类算法也无法满足对于海量数据的处理。针对这些问题,这里对传统的朴素贝叶斯算法进行了分析和改进,提出了SVM_WNB分类算法,并且在Hadoop云平台上对算法实现并行化处理,使其能够对大数据进行处理。实验验证,改进后的算法在准确性和效率等方面有明显提升,在大数据的分类上将会起到显著的效果。
SVM algorithm and naive Bayesian classification algorithm are the good performance of classification algorithm for complex data classification. However, they also have significant drawbacks so their classification are influenced and the tradi- tional data mining classification algorithm can not meet the need of mass data processing. To solve these problems, this paper analyzed traditional naive Bayesian classification algorithm and raised improvement suggestions for it, brought forward the SVM_ WNB classification algorithm. Then it conducted a parallelization processing on Hadoop cloud platform so that it could process mass data. Finally, through experimental verification, the new algorithm has obvious improvement in terms of its accuracy and efficiency. It can be concluded that the algorithm can be applied to large data classification, and will play a significant effect.