[目的/意义]建构云计算技术环境下的海量数据分析是一种需要预载大量数据集的数据计算处理。针对传统海量数据分析处理数据细节方式所导致的分析质量与效率问题,运用Google三大云计算技术对其进行改进。[方法 /过程]通过对Google三大云计算技术——GFS、MapReduce和Bigtable进行文献调查、内容分析和技术分析,梳理出Google云计算技术在数据处理、技术架构和算法模型等方面的部署创新和设计改进。[结果/结论]将Google云计算技术与传统本地数据分析处理方式与细节进行比较分析,得出Google云计算技术在操作海量数据分析时所具备的处理优势。借助Google云计算三大技术,提出海量数据分析流程在存储和访问、组织与管理以及并行处理3个方面的技术优化与改进策略。
[Purpose/significance ] Massive data analysis constructed in the cloud computing environment is a data calculation which needs to preload large data sets. Aiming at the analysis quality and efficiency issues caused by the detail way of massive data analysis and processing by the traditional methods, this paper uses the three Google cloud computing techniques to improve it. [ Method/process] Applying literature research, content analysis and technical analysis to the three Google cloud computing technology: GFS, MapReduce and Bigtable, this paper summarizes the deployment innova- tion and design improvement of Google cloud computing technology in data processing, technology framework and algorithm model. [ Result/conclusion ] Comparing Google cloud computing technology comparative analysis with traditional local data processing mode, this paper concludes the processing advantages of Google cloud computing technology in operating mas- sive data analysis. According to the Google cloud computing, we propose technology optimization and improvement of massive data analysis process in the three aspects- store and access, organization and management, as well as parallel processing.