东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于谓词的大数据抽样技术研究

ISSN号：1674-8425
期刊名称：《重庆理工大学学报：自然科学版》
时间：0
分类：TP392[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]电子科技大学中山学院计算机学院,广东中山528400, [2]广州华立科技职业学院传媒部,广州511325, [3]重庆理工大学计算机科学与工程学院,重庆400054
相关基金：国家自然科学基金青年科学基金资助项目（61300095）; 留学人员科技活动择优资助项目“商业智能应用软件研究与开发”（2009CR02）

作者：姜群[1,2,3], 傅瑜[1], 李文生[1], 梁瑞仕[1], 杨武[3]

关键词：抽样, 动态, 谓词, sample, dynamic, predicate

中文摘要：

为解决大数据抽样问题,采用MapReduce产生内容满足给定谓词的固定规模样本,并扩展了默认的Hadoop[1]设置,使其支持作业按需动态管理其资源消耗以解决MapReduce进程中的资源浪费问题。实验结果证明：本文所提策略的执行性能优于默认的Hadoop,从而证明MapReduce解决大数据抽样问题的可行性和有效性。

英文摘要：

To solve big data sampling problem,this paper uses MapReduce to sample big data and produce a sample whose content satisfy a given predicate. Since the default Hadoop execution depends on the size of the input and is wasteful of cluster resources. The paper has extended the default Hadoop to support job-demand dynamic management of its resource consumption on cluster. Experiments results show that the implementation of the proposed policy performance is better than the default Hadoop policy. Therefore,it was proved that sampling big by using MapReduce is feasible and effective.

同期刊论文项目