Martin Zinkevich, Amy Greenwald, Michael Littman
Although variants of value iteration have been proposed for ﬁnding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demon- strate by construction that existing variants of value iteration cannot ﬁnd stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value it- eration based on a new (non-stationary) equilibrium concept that we call “cyclic equilibria.” We prove that value iteration identiﬁes cyclic equi- libria in a class of games in which it fails to ﬁnd stationary equilibria. We also demonstrate empirically that value iteration ﬁnds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.