在数据库集群的研究中,可扩展性是一个重要的性能指标。为实现在数据高速增长或部分集群服务器故障情况下,数据依然能够快速、可靠、安全的分布到新的集群服务器节点上的目的,就必须合理设置数据划分的策略。将Key-Value存储技术中使用的一致性哈希算法思想借鉴运用到并行分析型数据库集群中,提出针对大规模结构化类特殊数据的一致性哈希划分方法,并在MapReduce框架下设计具体的数据划分方案。最后,以TPC-DS作为测试基准,与同类系统进行性能对比测试,实验结果表明方案不仅有良好的划分性能,且扩展性也较好。
In the research of parallel analytical database cluster, scalability is an important performance indicator. In order to divide the data fast, reliably and safely to the new cluster node in the case of data's rapid growth or some cluster servers' breakdown, we must set data partition strategy reasonably. In this paper, we study the consistency hash algorithm which is in common use in Key-Value storage technology and then put forward a method that uses the algorithm in the large-scale structured data partition. After that, we design a program in the MapReduce framework and then use TPC-DS as the benchmark to validate the method. The experimental results show that the scheme not only has good performance of the data partitioning, but also has better scalability.