位置:成果数据库 > 期刊 > 期刊详情页
DHMC:一种有效的高维Cube并行分布式存储结构
  • ISSN号:1000-1239
  • 期刊名称:《计算机研究与发展》
  • 时间:0
  • 分类:TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]扬州大学计算机科学与工程系,扬州225009, [2]东南大学经济管理学院,南京210096, [3]山东科技大学信息科学与工程学院,青岛266510
  • 相关基金:国家自然科学基金项目(60773103,60473012,70472033);国家科技基础条件平台基金项目(2004DKA20310);江苏省自然科学基金项目(BK2005047,BK2005046);江苏省“青蓝工程”基金项目. The pre-computation of data cubes is critical for improving the response time of OLAP systems and accelerating data mining tasks. But in a high-dimensional cube, it might not be practical to build all these cuboids. In this paper, we propose a novel approach to partition the high dimensional cube into some low-dimensional shell segment mini-cubes. It permits a significant reduction of CPU and I/O overhead for many queries by restricting the number of cube segments to be processed for both the fact table and bitmap indices. Experimental results show that the proposed method is significantly more efficient than the other existing cubing methods.
中文摘要:

在数据仓库系统中,数据立方体(Cube)及其预聚集处理在OLAP起到非常重要的作用.对于一d个d维的dataCube可以生成2d个聚集Cuboids和d∏i=1(|Di|+1)个聚集数据单元,但对于一个高维Cube,要创建这些所有聚集Cuboids是不现实的.提出通过共享分段立方体Mini.Cube的高维Cube并行分布式存储结构(DHMC),将高维Cube划分成若干个低维共享分段立方体Mini-Cube,利用并行分布式处理技术来创建这些分割的分段共享Mini—Cube及其聚集Cuboids,来实现高维Cube的并行创建和增量更新维护,从而解决高维OLAP聚集海量数据的存储与查询问题.理论分析与实验结果均表明DHMC性能最佳.

英文摘要:

Data cube and its pre-computation have been playing an essential role in fast OLAP (online analytical processing) in many data warehouses. For the cube with d dimensions, it can generate 2d d cuboids and d∏ i=1(| Di| + 1) aggregate cells. But in a high-dlmensional cube, it might not be practical to build all these cuboids. In this paper, a novel parallel and distributed storage structure is proposed for highdimensional cube based on shell segment mini-cubes (DHMC). DHMC partitions the high dimensional cube into some low-dimensional shell segment mini-cubes. OLAP queries are computed online by dynamically constructing cuboids from these shell segment mini-cubes through the parallel & distributed processing system. With this design, for high-dimensional OLAP, the total space that needs to store such shell segment mini-cubes is negligible in comparison with a high-dimensional cube. Such an approach permits a significant reduction of CPU and I/O overhead for many queries by restricting the number of cube segments to be processed for both the fact table and bitmap indices. The proposed data allocation and processing model supports parallel I/O and parallel processing, as well as load balancing for disks and processors. The methods of shell mini-cube are compared with other existing ones such as full cube and partial cube. The analytical and experimental results show that the algorithms of DHMC proposed are more efficient than the other existing ones.

同期刊论文项目
期刊论文 56 会议论文 8 获奖 4 专利 1
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349