东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于Hadoop的分布式朴素贝叶斯文本分类

期刊名称：卫洁, 石洪波, 冀素琴. 基于Hadoop的分布式朴素贝叶斯文本分类，计算机系统应用，2012.3
时间：0
分类：TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]山西财经大学信息管理学院,太原030006
相关基金：基金项目：国家自然科学基金（60873100）;山西财经大学科研资助项目
相关项目：贝叶斯分类器与判别式学习方法研究

关键词： HADOOP, 朴素贝叶斯, MAPREDUCE, 文本分类, Hadoop, naive bayes, MapReduee, text classification

中文摘要：

云计算的诞生,有效地解决了海量数据集的存储和分析处理。在云计算实现的开源Hadoop分布式系统集群上,使用MapReduce并行编程模型,设计并实现了一种对TFIDF改进的分布式朴素贝叶斯文本分类算法。实验结果表明,基于Hadoop框架的分布式朴素贝叶斯文本自动分类器不仅能处理节点失效,同时具有高效性和易扩展性的优势。

英文摘要：

The emergence of the cloud computing has resolved the difficult of storing the abundant data and analysing data processing effectively. Based on the Hadoop open-source implementation, the cloud computing clusters distributable systems. Meanwhile, the usage of MapReduce parallel programming model has implemented a modified distribution on TFIDF Naive Bayes text classification algorithm. The experimental results show that improved TFIDF has chosen this unique method. The Distributed Hadoop framework has based on Bayes text which classifies automatically. This new achievement can not only handle the failure of nodes, but also possess high reliability and much more scalable advantages.

同期刊论文项目