为了实现高效率低成本的海量数据挖掘,为企业决策提供参考,提出了基于云计算的海量数据挖掘模型。该模型中海量数据的处理和存储都是在云计算环境中进行的,首先对海量的数据进行一定的预处理,形成结构一致的数据后,应用云计算平台上的MapReduce模型进行高效的并行数据处理,最后得到所需的数据挖掘结果。基于云计算的海量数据挖掘的效率明显高于传统的数据挖掘,并且数据挖掘结果的准确性有了一定的提高,而且随着数据量的增多,该模型的优势会愈发明显。
In order to achieve high efficiency and low cost of massive data mining, and provide decision references for enterprise, the mod- el of massive data mining based on cloud computing has been proposed. The massive data:s processing and storage of the model were car- ried on the cloud computing environment. Firstly, take some certain preprocessing for the massive data to form data with the same struc- ture. Then, use the MapReduce model on the cloud computing platform to parallelly process the data efficiently. Finally, get the needed re- sult of data mining. The efficiency of massive data mining based on cloudcomputing is clearly higher than traditional data mining. Mean- while, the accuracy of data mining will be improved. Along with the increase of data, the advantage of the model will increasingly obvi- ous.