为解决MapReduce处理多个查询时效率低下的问题,提出了一种基于查询共享的MapReduce查询优化方法——Shareopt优化。通过分析所有查询的操作模式,找出其中共享的子查询部分,并根据子查询的执行顺序构造执行计划有向图(DAG),最终确定一组查询的整体执行计划。通过与Hive和Pig的对比,验证了该方法能够在保证准确性的情况下有效地减少执行步数,提高查询执行的效率。
To improve the multi-query processing efficiency for MapReduce, this paper proposed a multi-query optimization approach based on sub-query sharing and merging. Firstly, it analyzed the patterns of all the queries and identified those sub- queries which had sharing opportunities and could be merged. Next it constructed a directed aeyclic graph (DAG) according to the sub-queries execution sequences. It added the non-sharing sub-queries to the DAG graph and finally got the overall exe- cution plan for the queries. Experiment results show that the approach can effectively eliminate unnecessary re-computation and save query execution time.