东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于云计算的海量数据挖掘研究

ISSN号：1673-629X
期刊名称：《计算机技术与发展》
时间：0
分类：TP31[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]西安邮电大学管理工程学院,陕西西安710061, [2]西安邮电大学自动化学院,陕西西安710061
相关基金：国家自然科学基金资助项目（61100165/F020508）;陕西省自然科学基金（2007F18）

关键词：云计算, 数据挖掘, 海量数据, MAPREDUCE, 数据预处理, cloud computing, data mining, massive data, MapReduce, data preprocessing

中文摘要：

为了实现高效率低成本的海量数据挖掘，为企业决策提供参考，提出了基于云计算的海量数据挖掘模型。该模型中海量数据的处理和存储都是在云计算环境中进行的，首先对海量的数据进行一定的预处理，形成结构一致的数据后，应用云计算平台上的MapReduce模型进行高效的并行数据处理，最后得到所需的数据挖掘结果。基于云计算的海量数据挖掘的效率明显高于传统的数据挖掘，并且数据挖掘结果的准确性有了一定的提高，而且随着数据量的增多，该模型的优势会愈发明显。

英文摘要：

In order to achieve high efficiency and low cost of massive data mining, and provide decision references for enterprise, the mod- el of massive data mining based on cloud computing has been proposed. The massive data：s processing and storage of the model were car- ried on the cloud computing environment. Firstly, take some certain preprocessing for the massive data to form data with the same struc- ture. Then, use the MapReduce model on the cloud computing platform to parallelly process the data efficiently. Finally, get the needed re- sult of data mining. The efficiency of massive data mining based on cloudcomputing is clearly higher than traditional data mining. Mean- while, the accuracy of data mining will be improved. Along with the increase of data, the advantage of the model will increasingly obvi- ous.

同期刊论文项目