大数据时代的到来伴随着海量数据,进而使得筛选出具有价值的信息成为大数据被广泛应用的核心步骤.在此情况下Apache Hadoop顺势而生,其通过简化数据密集、高度并行的分布式应用来应对大数据带来的挑战.由于目前基于Hadoop的大数据平台在多领域普遍使用,从而平台搭建成为进行大数据探索的第一步.而很多文章介绍的平台搭建是在虚拟机中完成,与真实情况存在相应差异.本文讨论以真实集群为基础搭建Hadoop平台的原因,Hadoop集群的强大功能,搭建平台所需设备、环境、安装、设置及测试过程.
The age of big data is companied by massive data, making the selection of valuable information become a core step for wide usage of big data. Apache Hadoop is invented in this case and addressing the challenges from big data via simplifying data intensive and highly parallel distributed applications. The current big data based on Hadoop platform is widely used, so constructing a platform becomes the first step of exploration in big data. This paper describes the reason of Hadoop platform construct based on real cluster and the powerful function of Hadoop cluster as well as equipment, environment, installation, setting and testing process in the construction process.