随着CPU和内存的性能差距越来越大,系统设计者在CPU寄存器和内存之间插入高速缓存来弥补这个差距.高速缓存的数据存取速度远高于内存,所以数据库操作要获得更好的性能就必须考虑充分利用高速缓存.基于磁盘的连接操作是一种常用并且耗时的数据库查询操作,可是大多数传统的连接算法在设计时都没有考虑高速缓存的使用,从而使得这些连接算法无法充分利用CPU的能力.文中分析了传统的连接算法在高速缓存利用方面的问题,并且提出了一种新的可以充分利用高速缓存的磁盘连接算法DBCC-Join.连接位置索引对表JPIPT是用到的数据结构,说明了每个连接结果元组在各自表中的位置索引对.DBCC-Join的执行包括两个阶段:JPIPT构建阶段和结果输出阶段.JPIPT构建阶段对列存储化的连接属性执行高速缓存敏感的算法来构建连接位置索引对表.利用获得的JPIPT,结果输出阶段只需要对数据表执行一遍顺序扫描就可以获得结果.该文是第一篇提出利用高速缓存的磁盘连接算法的文章.实验表明,和传统磁盘连接算法相比,DBCC-Join算法可以获得一个数量级的加速比.
System designers exploit cache to make up for performance gap between CPU and main memory.Since data access speed of cache is much faster than that of memory,it is important for database operations to take maximum advantage of cache to obtain higher performance.Disk-based join operation is a common but time-consuming database operation.Unfortunately,most of traditional join algorithms do not take cache into consideration.This paper analyzes low cache utilization problem in traditional join algorithms and proposes a disk-based cache-conscious join algorithm DBCC-Join.Join positional index pair table(JPIPT) is a data structure which specifies the positional index pairs of join tuples in each table.The execution of DBCC-Join consists of two stages:JPIPT construction stage and result output stage.JPIPT construction stage performs cache-conscious construction algorithm on join attributes which are kept in column-oriented model,to obtain join positional index pair table of join results.The obtained JPIPT is used in result output stage to retrieve results in a one-pass sequential scan on each table.To the best of our knowledge,this paper is the first to exploit cache to improve performance of disk-based join algorithm.Experimental results show that compared to traditional join algorithms,DBCC-Join can be improved by a factor of an order of magnitude.