基于群体环境中个体agent局部感知和交互的生物原型,提出一种随机对策框架下的多agent局部学习算法.算法在与局部环境交互中采用贪婪策略最大化自身利益.分别在零和、一般和的单个平衡点和多个平衡点情形下改进了Nash-Q学习算法;提出了行为修正方法,并证明了算法收敛、计算复杂度降低.
A local learning algorithm for multi-agent-based stochastic games is proposed in light of the fact that the individual performs local perception and interaction in group. In the algorithm, every agent adopts greedy policy to maximize- its payoff when interacting with the environment. The Nash-Q earning algorithm is improved respectively in situations of zero-sum, general-sum games with only one equilibrium or multi-equilibrium. Besides, the method to modify the behavior is proposed, and it is proved that the algorithm is convergent and the computing complexity is reduced.