Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)

Bibtex Metadata Paper


Xiaofeng Wang, Tuomas Sandholm


Multiagent learning is a key problem in AI. In the presence of multi- ple Nash equilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exac- cerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two inter- related problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the first algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm’s parameters are easy to set to meet the convergence conditions.