在基于目标的强化学习任务中,欧式距离常作为启发式函数用于策略选择,其用于状态空间在欧式空间内不连续的任务效果不理想,针对此问题,引入流形学习中计算复杂度较低的拉普拉斯特征映射法,提出一种基于谱图的理论的启发式策略选择方法。
As a heuristic function, the Euclidean distance is usually used to select online action in reinforcement learning based on goal position. It is not applied to these tasks whose state spaces are not continuous in Euclidean space. For the problem, the Laplacian Eigenmap whose computational complexity is lower in manifold learning is introduced, then a method of heuristic policy selection based on the spectral graph theory is proposed. The proposed method is suitable for these tasks not only whose state spaces are continuous in some manifold that has a good estimation of intrinsic dimension, but also whose connection relation is expressed by an undirected graph. The simulation results of grid world show the effectiveness of the proposed method.