在加强学习的区域的主要挑战正在可伸缩直到更大、更复杂的问题。瞄准加强学习的可伸缩的问题,学习方法的可伸缩的加强, DCS-SRL,根据 divide-and-conquer 策略,和它的集中被建议被证明。在这个方法,在大州的空间或连续州的空间的学习问题被分解成多重更小的 subproblems。给一个特定的学习算法,每 subproblem 能与有限可得到的资源独立地被解决。最后,部件答案能被重新结合获得需要的结果。探讨在调度程序优先考虑 subproblems 的问题,安排算法的加权的优先级被建议。这个安排算法保证那计算集中于被期望最大地高效的问题空间的区域。帮助学习过程,一个新平行方法,叫的 DCS-SPRL,从把 DCS-SRL 与平行安排体系结构相结合被导出。在 DCS-SPRL 方法, subproblems 将在有能力在平行工作的处理器之中被散布。试验性的结果证明基于 DCS-SPRL 学习有快集中速度和好可伸缩性。
The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To ad- dress the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.