Scientific instruments and simulation programs are generating large amounts of multidimensional array data.Queries with value and dimension subsetting conditions are commonly used by scientists to find useful information from big array data,and data storage and indexing methods play an important role in supporting queries on multidimensional array data efficiently.In this paper,we propose SwiftArray,a new storage layout with indexing techniques to accelerate queries with value and dimension subsetting conditions.In SwiftArray,the multidimensional array is divided into blocks and each block stores sorted values.Blocks are placed in the order of a Hilbert space-filling curve to improve data locality for dimension subsetting queries.We propose a 2-D-Bin method to build an index for the blocks’value ranges,which is an efficient way to avoid accessing unnecessary blocks for value subsetting queries.Our evaluations show that SwiftArray surpasses the NetCDF-4 format and FastBit indexing technique for queries on multidimensional arrays.
Scientific instruments and simulation programs are generating large amounts of multidimensional array data. Queries with value and dimension subsetting conditions are commonly used by scientists to find useful information from big array data, and data storage and indexing methods play an important role in supporting queries on multidimensional array data efficiently. In this paper, we propose SwiftArray, a new storage layout with indexing techniques to accelerate queries with value and dimension subsetting conditions. In SwiftArray, the multidimensional array is divided into blocks and each block stores sorted values. Blocks are placed in the order of a Hilbert space-filling curve to improve data locality for dimension subsetting queries. We propose a 2-D-Bin method to build an index for the blocks' value ranges, which is an efficient way to avoid accessing unnecessary blocks for value subsetting queries. Our evaluations show that SwiftArray surpasses the NetCDF-4 format and FastBit indexing technique for queries on multidimensional arrays.