MapReduce是一种并行编程模型,可以用来处理和生成大量数据集。它的调度以及容错机制是模型的重要一部分。通过对MapReduce模型的执行过程进行分析,提取得到其上面的调度以及容错模型。并将P2P模型中常用的调度思想使用于MapReduce调度模型上,对原来的调度机制和容错机制做一定的修改。
MapReduce is a kind of parallel programming model which can be used to process and generate large data sets. The strategies of scheduling and fault tolerance play an important role in the execution of MapReduce. After analyzing the execution of MapReduce, we get the scheduling and fault tolerance model. Based on this model, using some common strategies in P2P model, we propose three new scheduling strategies, and we get the new fault tolerance mechanism for each scheduling strategy after making some modifications to the original fault tolerance method of MapReduce.