东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

SDD-1改进算法在Hive中应用

ISSN号：1000-5900
期刊名称：《湘潭大学自然科学学报》
时间：0
分类：TP323[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]江苏大学计算机科学与通讯工程学院,江苏镇江212013, [2]江苏省交通技师学院电气与信息工程系,江苏镇江212006
相关基金：国家自然科学基金项目（61072002）

关键词：数据预处理, 双半连接, SDD-1改进算法, data pre-processing, double half connected, SDD-1 improved algorithm

中文摘要：

针对Hive在处理连接查询时所存在的执行时间长和带宽资源消耗大等问题，提出了一种基于数据预处理和双半连接的SDD-1改进算法．首先，引入预处理技术，在各分布节点对原始数据进行归并排序，以减少汇聚节点的数据映射次数，加快数据处理执行速度；其次，采用基于行和列的双半连接技术，进一步缩减在不同节点间的数据传输量，减少带宽资源消耗．仿真实验表明，相比原始的Hive连接算法，改进算法在元组数达到5000和8000时，可使查询速度提升10％，有效缩短查询的处理和响应时间，该改进算法可方便地应用到其他云计算平台上．

英文摘要：

To solove the existence of the long execution time and bandwidth resource consumption and other issues when dealing with queries in Hive system, this paper presented based on data preprocessing and double half connected SDD-1 improved algorithm. Firstly, the introduction of pre-processing technology, the distribution of nodes in each merge sort the raw data in order to reduce the number of data aggregation node mapping, speed up data processing speed of execution; Secondly, the use of semi-connection technology based on double rows and columns, and further reduction in different data transfer between nodes, reducing bandwidth consumption. The simulation results show that, compared to the original Hive join algorithm, the improved algorithm in the number of tuples to 5 000 and 8 000, can make the query speed increased by 10 %, shorten the processing and query response time, application of the improved algorithm can be convenient to other cloud computing platform.

同期刊论文项目