当数据量从GB级上升至TB级甚至PB级时,具有高性能的并行数据库在保证扩展性和容错性的同时计算代价会很高。针对该问题,设计一种面向大规模数据处理的并行数据库引擎FlexDB。利用Map Reduce的并行计算框架作为通信层,调度和协调集群中各节点的计算和通信。实验结果表明,FlexDB的系统性能接近于并行数据库,并且具有较好的扩展性和容错性。
When the amount of data from GB goes up to TB level or even PB level,parallel database with high performance cost too much in order to achieve scalability and fault tolerance.To address the problem,this paper designs a parallel database engine——FlexDB,which is based on Map Reduce.The parallel computing framework of Map Reduce is as a communication layer of FlexDB which is to assign computing tasks and coordinate communications among all nodes in cluster.Experimental results show that the FlexDB system performance is close to parallel database,and has good expansibility and fault tolerance.