为解决现有的HBase数据压缩策略选择方法未考虑数据的冷热性,以及在选择过程中存在片面性和不可靠性的缺陷,提出了基于HBase数据分类的压缩策略选择方法。依据数据文件的访问频度将HBase数据划分为冷热数据,并限定具体的访问级别;在此基础上增加评估层,综合考虑基于相邻区和统计列的选择方法,提出基于数据访问级别的压缩策略选择方法。仿真实验及结果表明,提出的压缩策略选择方法不仅节省了存储空间,还大大提高了数据查询的性能。
Most of the current compression strategies selection methods for HBase data did not consider whether the data was cold or hot. Besides, problem of incompleteness and unreliability existed during selection process. To address the problems above, a compression strategies selection method based on classification of HBase data was put forward. HBase data was classified into cold and hot data according to the access frequency of each data file and an access level would be designated to each file. On this base, an evaluation layer was added and a compression strategies selection method based on access level with integration of neighbor sector and statistic column based selection methods. Simulation experiments and results demonstrate that the proposed compression strategies selection method based on classification of HBase data can not only save storage space but also greatly improve the query performance of HBase data.