针对MapReduce框架与传统关系型数据库兼容性不好的问题,提出了一种基于分块结构的分布式关系数据库ChunkDB.并对MapReduce架构进行了扩展设计,使ChunkDB与MapReduce有效结合,将MapReduce的扩展性、易操作性、高并行性与关系数据库的索引等查询优化优势相结合.实验证明基于MapReduce的ChunkDB数据库能够为数据仓库应用提供快速高效的并行查询.
MapReduce is a highly efficient distributed and parallel computing framework,allowing users to readily manage large clusters in parallel computing.But the MapReduce framework is not compatible with traditional relational databases.This paper proposes a distributed relational database ChunkDB based on the chunk structure,and extends and redesigns the MapReduce framework to ensure compatibility with the ChunkDB database.Thus,scalability,ease of operation,the high parallelism of MapReduce were integrated with the advantages,including indexing,query optimization of a relational database.The ChunkDB database based on MapReduce provided fast and efficient parallel query for data warehouse applications.