为充分利用I/O资源并提高数据分析效率,针对高能物理数据分析过程及数据存储特点,利用Java本地接口技术,提出基于HBase C++访问接口的数据完全本地化分析平台,并设计MapReduce模型的相关算法及组件,根据Mapper任务的优化分配及组合提高CPU资源的利用率。通过集成高能物理数据分析环境、作业管理系统、ROOT绘图模块等,实现全新的Web用户接口,简化用户操作。测试结果表明,与传统基于文件存储的数据分析系统相比,该平台的数据分析速度更快,可扩展性更好。
To make full use of I/O resources and improve data analysis efficiency,according to the features of data analysis procedure and data storage,this paper develops new C++ interfaces to access HBase by using Java Native Interface(JNI) and provides a data fully localization analysis platform for data accessing.Meanwhile,it re-designs and implements the related algorithms and software components of MapReduce,and enables optimal allocation and combination of Mapper tasks to improve the utilization of CPU resources.In addition,it provides new user friendly interfaces by integrating the data analysis environment,job management system and ROOT graphics module.Test results show that the new platform is faster and more scalable compared with traditional data analysis system based on file storage.