云计算为大数据处理提供了一种强大而高效的解决方案.在此模式下,数据管理者(data manager,简称DM)可以租用多个数据中心实时处理地理分散的数据.然而,由于数据产生的动态性以及资源价格的波动性,将数据迁移至哪些数据中心并提供合适的计算资源来处理它们,成为DM低成本处理多源数据的一大问题.首先,将以上问题转换成联合随机优化问题;然后,利用李雅普诺夫(Lyapunov)优化框架将原问题分解成两个独立的子问题进行求解;最后,基于求解结果设计在线算法.理论分析结果表明:所提算法可不断趋近线下最优解,并能够保证数据处理时延.在WorldCup98和Youtube数据集上的实验验证了理论分析结果的正确性以及该方法的优越性.
Cloud computing has shown to provide a cost-effective and powerful platform for big data processing. Under this paradigm, data manager (DM) usually rents geographically distributed datacenters to process their geographically dispersed data set, concerning its convenience and economy. Usually, the data sets are dynamically generated and the resource pricing varies over time, which make it a critical issue of cost effectiveness to move the data from different geographic locations to different datacenters while providing suitable computation resources for processing. In this paper, a pertinent joint stochastic optimization problem is firstly formulated, and then the problem is decoapled into two independent subproblems with efficient solutions via Lyapunov framework. Next, an online algorithm based on the solutions is developed. Theoretical analysis show that the proposed online algorithm can produce a solution which isarbitrarily close to the offline optimal solution while minimizing the data processing delays. Experiments on WorldCup98 and Youtube dataset validate the proposed algorithms and demonstrate the superiority of the new approach.