随着科技不断发展和新技术的不断涌现,数据的重要性变得越来越明显,与此同时数据也在以超出人们预期的速度快速地增长。物联网技术和云计算技术的出现给数据挖掘和知识发现等相关领域既带来了巨大挑战,也赋予了新的活力,物联网的出现和成功运用使得数据具有时间特性和空间特性,在增加数量的同时也增加了数据的维度,从而使一些传统的数据挖掘的工具和算法变得效率低下;而云计算平台提供的计算能力和简易的并行编程思想使得大量数据所带来的问题在一定程度上得到解决。粗糙集是一种成功数据发掘工具,但在面对日益增长的数据时,效率也变得不理想。借助Map/Reduce思想将传统串行运行算法成功转移到云环境中。首先简单介绍了Map/Reduce流程和粗糙集的相关理论,然后扩展云环境下编程理论和提出相应的算法,最后通过复杂度和相应实验验证了算法的有效性。
With the continuous development of science and technologies, many new technologies are brought out, and the function of data becomes more and more important. At the same time, the data is also growing at a fast speed which exceeds our expectations. The emergence of the Internet of things and the cloud computing has brought enormous challenges to the data mining, knowledge discovery and other related fields, hut' they also give them new and energetic lives. The data produced by the Internet of things processes spatial characteristics and temporal characteristics, which increase the dimension of the data. For some traditional tools and algorithms, the process has become inefficient. However, the powerful computing ability and the simply parallel programming way of the cloud make the problem solved at some extent. Rough set is a successful data-mining tool, but in the face of the increasing data, it becomes inefficient. In this paper, with the help of the Map/Reduce we have transferred the traditional serial algorithm into the cloud environment. Firstly we briefly introduces the Map/Reduce process and the rough set theory, and then expand the cloud environment programming theories. At last, we put forward the corresponding algorithm, and verify the validity of the algorithm by the complexity and the corresponding experimental.