部分可观察马尔可夫决策过程(POMDP)是描述不确定环境下进行决策的数学模型.基于点的值迭代算法是求解POMDP问题的一类近似解法.针对基于点的算法中信念选择这一关键问题,提出了一种基于熵的信念选择方法(EBBS).EBBS算法通过计算可以转移到的信念点的不确定性,选择熵较小且到当前信念点集距离大于一定阈值的信念点扩充信念点集合.实验结果表明,通过熵选择信念点的值迭代算法只需要在较少数量的信念点上进行值迭代操作就能得到预期的折扣报酬.
Partially Observable Markov Decision Process (POMDP) provides a mathematical model for decision making under uncertainty. Point-Based value iteration algorithms are effective proximate algorithms to solve POMDP problems. In this paper we propose a belief selection method, Entropy-Based Belief Selection (EBBS), based on the entropy of belief points to the crucial issue of point-based algorithms. The EBBS algorithm first sorts the belief points by entropy and then selects belief that has lower entropy and whose distance to the current set is more than a threshold. And the experimental results illustrate that this method could perform value iteration operation on fewer belief points to gain an expected discounted reward.