东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

A parallel scheduling algorithm for reinforcement learning in large state space

ISSN号：1000-7180
期刊名称：《微电子学与计算机》
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程] TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]Institute of Computer Science and Technology, Soochow University, Suzhou 215006, China, [2]Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China, [3]Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
相关基金：Acknowledgements This paper was supported by the National Natural Science Foundation of China （Grant Nos. 61272005, 61070223, 61103045, 60970015, and 61170020）, Natural Science Foundation of Jiangsu （BK2012616, BK2009116）, High School Natural Foundation of Jiangsu （09KJA520002）, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University （93K172012K04）.

作者： Quan LIU[1,3], Xudong YANG[1], Ling JING[2], Jin LI[1], Jiao LI[1]

关键词：强化学习, 调度算法, 并行方法, 大型空间, 状态空间, 可扩展性, 学习方法, 分而治之, divide-and-conquer strategy, parallel schedule, scalability, large state space, continuous state space

中文摘要：

在加强学习的区域的主要挑战正在可伸缩直到更大、更复杂的问题。瞄准加强学习的可伸缩的问题，学习方法的可伸缩的加强， DCS-SRL，根据 divide-and-conquer 策略，和它的集中被建议被证明。在这个方法，在大州的空间或连续州的空间的学习问题被分解成多重更小的 subproblems。给一个特定的学习算法，每 subproblem 能与有限可得到的资源独立地被解决。最后，部件答案能被重新结合获得需要的结果。探讨在调度程序优先考虑 subproblems 的问题，安排算法的加权的优先级被建议。这个安排算法保证那计算集中于被期望最大地高效的问题空间的区域。帮助学习过程，一个新平行方法，叫的 DCS-SPRL，从把 DCS-SRL 与平行安排体系结构相结合被导出。在 DCS-SPRL 方法， subproblems 将在有能力在平行工作的处理器之中被散布。试验性的结果证明基于 DCS-SPRL 学习有快集中速度和好可伸缩性。

英文摘要：

The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.

同期刊论文项目