东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一个基于三元组存储的列式OLAP查询执行引擎

ISSN号：1000-9825
期刊名称：《软件学报》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]数据工程与知识工程教育部重点实验室(中国人民大学),北京100872, [2]中国人民大学信息学院,北京100872, [3]中国人民大学中国调查与数据中心,北京100872
相关基金：国家科技重大专项（核高基）（2010zxo1042-001-002）;国家自然科学基金（61272138,61232007）;中国人民大学研究生科学研究基金（13XNH216）

作者：朱阅岸[1,2], 张延松[1,2,3], 周烜[1,2], 王珊[1,2]

关键词：大数据分析, 联机分析处理, 内存列存储数据库, 表连接算法, 物化策略, big data analysis, OLAP, main-memory columnar database, join algorithm, materialization

中文摘要：

大数据与传统的数据仓库技术相结合产生了大数据实时分析处理需要（volume＋velocity），它要求大数据背景下的数据仓库不能过多地依赖物化、索引等高存储代价的优化技术，而要提高实时处理能力来应对大数据分析中数据量大、查询分析复杂等特点．这些查询分析操作一般表现为在事实表和维表之间连接操作的基础上对结果集上进行分组聚集等操作．因此，表连接和分组聚集操作是ROLAP（relationalOLAP）性能的两个重要决定因素．研究了新硬件平台下针对大规模数据的OLAP查询的性能，设计新的列存储OLAP查询执行引擎CDDTA-MMDB（columnar direct dimensional tuple access—main memory databasequeryexecutionengine．直接维表元组访问的内存数据库查询执行引擎）．基于三元组的物化策略，使得CDDTA．MMDB能够减少内存列存储模型上表连接操作访问基表和中间数据结构的次数．首先，CDDTA—MMDB将查询分解为作用在维表和事实表上的子查询，如果只涉及过滤操作，子查询将生成（代理键，布尔值）二元组；否则，子查询生成（代理键，关键字，值）三元组．然后，只需一趟扫描事实表，利用事实表的外键映射函数直接定位相应三元组或者二元组，完成相应的过滤、连接或聚集操作．CDDTA．MMDB充分考虑了内存列存储数据库的设计原则，尽量减少随机内存访问．实验结果表明：CDDTA．MMDB是高效的，与具代表性的列存储数据库相比，比MonetDB5．5快2．5倍，比C．store的invisibleioin快5倍；并且，CDDTA—MMDB在多核处理器上具有线性加速比．

英文摘要：

Integrating big data and traditional data warehouse （DW） techniques bring demand for real-time big data analysis. The new demand means DW can not depend too much on the optimization such as materialization and indexing which consume large space, but instead needs to enhance ability of real-time analysis to handle big data analysis which usually issues complex queries on huge data volumes. Those queries usually consist in applying group or aggregation operator on the join result between fact table and dimension table（s）. The join and group operation often are the bottle-necks for performance improvement. This paper studies the OLAP performance under the new hardware platform and big data environment, and develops a new OLAP query execution engine in columnar storage, called CDDTA-MMDB （columnar direct dimensional tuple access for main memory database query execution engine）. The optimized materialization makes CDDTA-MMDB reduce access to base table and intermediate data structure during join procedure. CDDTA- MMDB decomposes the query into sub-queries on the fact table and dimension table respectively. If the sub-query on dimension table only serves as filter, it will produce the binary tuple （surrogate,Boolean_value）; otherwise, it will produce the triplet in the form of （surrogate,key,value）. Thus, by just scanning the fact table one-pass and utilizing the mapping function of foreign keys in fact table to directly access the binary tuples or triplets, the executor can accomplish the join, filter and group operations. Consideration is fully placed on the design principle for the main-memory columnar database. Experimental results show that the system is efficient and can be 2.5 times faster than MonetDB 5.5 and 5 times faster than invisible join used by C-store. Moreover, it scales linearly on multi-core processors.

同期刊论文项目

基于语义分析的数据库交互技术

期刊论文 7

基于服务组合的"系统的系统"软件机理与方法

期刊论文 8

同项目期刊论文

Integrating behavior analysis into architectural modeling

不确定图最可靠最大流算法研究

一种不确定图中最可靠最大流问题的解决方案

多核处理器下事务型数据库性能优化技术综述

Web服务选择中偏好不确定问题的研究

HC-Store： putting MapReduce＇s foot in two camps

Facial expression recognition via weighted group sparsity

面向大规模机群的可扩展OLAP查询技术

多核处理器下事务型数据库性能优化技术综述

混合的大规模数据库自动模式抽象方法

双流模式下高吞吐量移动对象范围查询算法

HC-Store： putting MapReduce＇s foot in two camps

批处理在内存数据处理系统中的应用

期刊信息

《软件学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国科学院软件研究所中国计算机学会
主编：赵琛
地址：北京8718信箱中国科学院软件研究所
邮编：100190
邮箱：jos@iscas.ac.cn
电话：010-62562563

国际标准刊号：ISSN：1000-9825
国内统一刊号：ISSN：11-2560/TP
邮发代号:82-367

获奖情况:
2001年入选中国期刊方阵“双百期刊”,2000年荣获中国科学院优秀科技期刊一等奖

国内外数据库收录:
俄罗斯文摘杂志,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:54609