为了在存储空间有限或者考虑存储成本的情况下选择性存储数据集,结合云环境的特点,提出一个启发式的考虑选择性存储中间数据集并结合用户工作流的完成时间需求,将竞价实例与按需实例相结合的工作流调度算法。根据全局权重对任务进行分组与调度,通过对数据集的存储与再生成代价进行估算来管理中间数据集的存储。对云环境进行仿真,并设计实验与其他存储策略进行对比。结果表明,在云实例价格动态变化的环境中,该算法在保证工作流完成率及减少调度产生的总费用方面具有一定的优越性。
To select storage datasets under limited storage space or storage cost, a heuristic storage intermediate datasets considered selectivity was proposed with cloud characteristics, and a workflow scheduling algorithm combined Spot Instance (SI) with On-demand Instance (OI) was put forward. According to the global weight, the tasks were grouped and scheduled, and a trade-off between generation dataset cost and storage cost was considered to manage the storage the intermediate datasets. Through simulating the cloud environment, performing experiments and comparing with other scheduling and storage strategies, the results demonstrated that the proposed algorithm was effective in reducing cost while satisfying the deadline constraints of workflows.