Part of Advances in Neural Information Processing Systems 18 (NIPS 2005)
Martin Zinkevich, Amy Greenwald, Michael Littman
Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demon- strate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value it- eration based on a new (non-stationary) equilibrium concept that we call “cyclic equilibria.” We prove that value iteration identifies cyclic equi- libria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.