针对目前主流的多核处理器,研究了基于共享缓存多核处理器环境下的数据库Hash连接优化.首先提出基于Radix-Join算法的Hash连接多线程执行框架,通过实例分析了影响多线程Radix-Join算法性能的因素.在此基础上,优化了Hash连接多线程执行框架中的各种线程及其访问共享Cache的性能,优化了聚集连接时Hash连接算法的内存访问,并分析了多线程聚集划分的加速比.基于开源数据库INGRES和EaseDB,实现了所提出的连接多线程执行框架,在实验中测试了多线程Hash连接框架的性能.实验结果表明,该算法可以有效解决Hash连接执行时共享Cache在多线程条件下的访问冲突和处理器负载均衡问题,极大地提高了Hash连接性能.
This paper presents hash join optimization based on shared cache CMP (chip multi-processor).Firstly,it proposes a multithreaded execution framework of hash join based on Radix-Join algorithm,and then analyzes the factors which affect the performance of multithreaded Radix-Join algorithm through two instances.Based on the analysis,the performance of various threads and their shared-cache access behaviors in the hash join multithreaded execution framework were optimized,and optimize memory access of hash join in cluster join phase.It then analyzes the speedup of multithreaded cluster partition in theory was analyzed.All of the algorithms are implemented in the INGRES and EaseDB.In the experiments,the performance of the multithreaded execution framework of hash join is tested,and the results show that the proposed algorithm could effectively resolve the cache access conflict and load balance of CMP cores in multithreaded environment and hash join performance is improved.