{"title": "No-Regret Learning in Bayesian Games", "book": "Advances in Neural Information Processing Systems", "page_first": 3061, "page_last": 3069, "abstract": "Recent price-of-anarchy analyses of games of complete information suggest that coarse correlated equilibria, which characterize outcomes resulting from no-regret learning dynamics, have near-optimal welfare. This work provides two main technical results that lift this conclusion to games of incomplete information, a.k.a., Bayesian games. First, near-optimal welfare in Bayesian games follows directly from the smoothness-based proof of near-optimal welfare in the same game when the private information is public. Second, no-regret learning dynamics converge to Bayesian coarse correlated equilibrium in these incomplete information games. These results are enabled by interpretation of a Bayesian game as a stochastic game of complete information.", "full_text": "No-Regret Learning in Bayesian Games\n\nJason Hartline\n\nNorthwestern University\n\nEvanston, IL\n\nhartline@northwestern.edu\n\nVasilis Syrgkanis\nMicrosoft Research\n\nNew York, NY\n\nvasy@microsoft.com\n\n\u00b4Eva Tardos\n\nCornell University\n\nIthaca, NY\n\neva@cs.cornell.edu\n\nAbstract\n\nRecent price-of-anarchy analyses of games of complete information suggest that\ncoarse correlated equilibria, which characterize outcomes resulting from no-regret\nlearning dynamics, have near-optimal welfare. This work provides two main tech-\nnical results that lift this conclusion to games of incomplete information, a.k.a.,\nBayesian games. First, near-optimal welfare in Bayesian games follows directly\nfrom the smoothness-based proof of near-optimal welfare in the same game when\nthe private information is public. Second, no-regret learning dynamics converge\nto Bayesian coarse correlated equilibrium in these incomplete information games.\nThese results are enabled by interpretation of a Bayesian game as a stochastic\ngame of complete information.\n\n1\n\nIntroduction\n\nA recent con\ufb02uence of results from game theory and learning theory gives a simple explanation for\nwhy good outcomes in large families of strategically-complex games can be expected. The advance\ncomes from (a) a relaxation the classical notion of equilibrium in games to one that corresponds to\nthe outcome attained when players\u2019 behavior ensures asymptotic no-regret, e.g., via standard online\nlearning algorithms such as weighted majority, and (b) an extension theorem that shows that the\nstandard approach for bounding the quality of classical equilibria automatically implies the same\nbounds on the quality of no-regret equilibria. This paper generalizes these results from static games\nto Bayesian games, for example, auctions.\nOur motivation for considering learning outcomes in Bayesian games is the following. Many impor-\ntant games model repeated interactions between an uncertain set of participants. Sponsored search,\nand more generally, online ad-auction market places, are important examples of such games. Plat-\nforms are running millions of auctions, with each individual auction slightly different and of only\nvery small value, but such market places have high enough volume to be the \ufb01nancial basis of large\nindustries. This online auction environment is best modeled by a repeated Bayesian game: the auc-\ntion game is repeated over time, with the set of participants slightly different each time, depending\non many factors from budgets of the players to subtle differences in the opportunities.\nA canonical example to which our methods apply is a single-item \ufb01rst-price auction with players\u2019\nvalues for the item drawn from a product distribution. In such an auction, players simultaneously\nsubmit sealed bids and the player with the highest bid wins and pays her bid. The utility of the\nwinner is her value minus her bid; the utilities of the losers are zero. When the values are drawn from\nnon-identical continuous distributions the Bayes-Nash equilibrium is given by a differential equation\n\n1\n\n\fthat is not generally analytically tractable, cf. [8] (and generalizations of this model, computationally\nhard, see [3]). Again, though their Bayes-Nash equilibria are complex, we show that good outcomes\ncan be expected in these kinds of auctions.\nOur approach to proving that good equilibria can be expected in repeated Bayesian games is to\nextend an analogous result for static games,1 i.e., the setting where the same game with the same\npayoffs and the same players is repeated. Nash equilibrium is the classical model of equilibrium for\neach stage of the static game. In such an equilibrium the strategies of players may be randomized;\nhowever, the randomizations of the players are independent. To measure the quality of outcomes in\ngames Koutsoupias and Papadimitriou [9] introduced the price of anarchy, the ratio of the quality\nof the worst Nash equilibrium over a socially optimal solution. Price of anarchy results have been\nshown for large families of games, with a focus on those relevant for computer networks. Roughgar-\nden [11] identi\ufb01ed the canonical approach for bounding the price of anarchy of a game as showing\nthat it satis\ufb01es a natural smoothness condition.\nThere are two fundamental \ufb02aws with Nash equilibrium as a description of strategic behavior. First,\ncomputing a Nash equilibrium can be PPAD hard and, thus, neither should ef\ufb01cient algorithms for\ncomputing a Nash equilibrium be expected nor should any dynamics (of players with bounded com-\nputational capabilities) converge to a Nash equilibrium. Second, natural behavior tends to introduce\ncorrelations in strategies and therefore does not converge to Nash equilibrium even in the limit.\nBoth of these issues can be resolved for large families of games. First, there are relaxations of Nash\nequilibrium which allow for correlation in the players\u2019 strategies. Of these, this paper will focus\non coarse correlated equilibrium which requires the expected payoff of a player for the correlated\nstrategy be no worse than the expected payoff of any action at the player\u2019s disposal. Second, it was\nproven by Blum et al. [2] that the (asymptotic) no-regret property of many online learning algorithms\nimplies convergence to the set of coarse correlated equilibria.2\nBlum et al. [2] extended the de\ufb01nition of the price of anarchy to outcomes obtained when each\nplayer follows a no-regret learning algorithm.3 As coarse correlated equilibrium generalize Nash\nequilibrium it could be that the worst case equilibrium under the former is worse than the latter.\nRoughgarden [11], however, observed that there is often no degradation; speci\ufb01cally, the very same\nsmoothness property that he identi\ufb01ed as implying good welfare in Nash equilibrium also proves\ngood welfare of coarse correlated equilibrium (equivalently: for outcomes from no-regret learners).\nThus, for a large family of static games, we can expect strategic behavior to lead to good outcomes.\nThis paper extends this theory to Bayesian games. Our contribution is two-fold: (i) We show an\nanalog of the convergence of no-regret learning to coarse correlated equilibria in Bayesian games,\nwhich is of interest independently of our price of anarchy analysis; and (ii) we show that the coarse\ncorrelated equilibria of the Bayesian version of any smooth static game have good welfare. Com-\nbining these results, we conclude that no-regret learning in smooth Bayesian games achieves good\nwelfare.\nThese results are obtained as follows. It is possible to view a Bayesian game as a stochastic game,\ni.e., where the payoff structure is \ufb01xed but there is a random action on the part of Nature. This\nviewpoint applied to the above auction example considers a population of bidders associated for\neach player and, in each stage, Nature uniformly at random selects one bidder from each population\nto participate in the auction. We re-interpret and strengthen a result of Syrgkanis and Tardos [12]\nby showing that the smoothness property of the static game (for any \ufb01xed pro\ufb01le of bidder values)\nimplies smoothness of this stochastic game. From the perspective of coarse correlated equilibrium,\nthere is no difference between a stochastic game and the non-stochastic game with each random\nvariable replaced with its expected value. Thus, the smoothness framework of Roughgarden [11]\nextends this result to imply that the coarse correlated equilibria of the stochastic game are good.\nTo show that we can expect good outcomes in Bayesian games, it suf\ufb01ces to show that no-regret\nlearning converges to the coarse correlated equilibrium of this stochastic game. Importantly, when\nwe consider learning algorithms there is a distinction between the stochastic game where players\u2019\npayoffs are random variables and the non-stochastic game where players\u2019 payoffs are the expectation\n\n1In the standard terms of the game theory literature, we extend results for learning in games of complete\n\ninformation to games of incomplete information.\n\n2This result is a generalization of one of Foster and Vohra [7].\n3They referred to this price of anarchy for no-regret learners as the price of total anarchy.\n\n2\n\n\fof these variables. Our analysis addressed this distinction and, in particular, shows that, in the\nstochastic game on populations, no-regret learning converges almost surely to the set of coarse\ncorrelated equilibrium. This result implies that the average welfare of no-regret dynamics will be\ngood, almost surely, and not only in expectation over the random draws of Nature.\n\n2 Preliminaries\n\nThis section describes a general game theoretic environment which includes auctions and resource\nallocation mechanisms. For this general environment we review the results from the literature for\nanalyzing the social welfare that arises from no-regret learning dynamics in repeated game play.\nThe subsequent sections of the paper will generalize this model and these results to Bayesian games,\na.k.a., games of incomplete information.\nGeneral Game Form. A general game M is speci\ufb01ed by a mapping from a pro\ufb01le a \u2208 A \u2261\nA1 \u00d7 \u00b7\u00b7\u00b7 \u00d7 An of allowable actions of players to an outcome. Behavior in a game may result in\n(possibly correlated) randomized actions a \u2208 \u2206(A).4 Player i\u2019s utility in this game is determined\nby a pro\ufb01le of individual values v \u2208 V \u2261 V1 \u00d7 \u00b7\u00b7\u00b7 \u00d7 Vn and the (implicit) outcome of the game; it\nis denoted Ui(a; vi) = Ea\u223ca [Ui(a; vi)]. In games with a social planner or principal who does not\ntake an action in the game, the utility of the principal is R(a) = Ea\u223ca [R(a)]. In many games of\ninterest, such as auctions or allocation mechanisms, the utility of the principal is the revenue from\npayments from the players. We will use the term mechanism and game interchangeably.\nIn a static game the payoffs of the players (given by v) are \ufb01xed. Subsequent sections will consider\nBayesian games in the independent private value model, i.e., where player i\u2019s value vi is drawn\nindependently from the other players\u2019 values and is known only privately to player i. Classical\ngame theory assumes complete information for static games, i.e., that v is known, and incomplete\ninformation in Bayesian games, i.e., that the distribution over V is known. For our study of learning\nin games no assumptions of knowledge are made; however, to connect to the classical literature\nwe will use its terminology of complete and incomplete information to refer to static and Bayesian\ngames, respectively.\n\nwill denote by SW (a; v) =(cid:80)\n\nSocial Welfare. We will be interested in analyzing the quality of the outcome of the game as\nde\ufb01ned by the social welfare, which is the sum of the utilities of the players and the principal. We\ni\u2208[n] Ui(a; vi) + R(a) the expected social welfare of mechanism M\nunder a randomized action pro\ufb01le a. For any valuation pro\ufb01le v \u2208 V we will denote the optimal\nsocial welfare, i.e, the maximum over outcomes of the game of the sum of utilities, by OPT(v).\n\nNo-regret Learning and Coarse Correlated Equilibria. For complete information games, i.e.,\n\ufb01xed valuation pro\ufb01le v, Blum et al. [2] analyzed repeated play of players using no-regret learning\nalgorithms, and showed that this play converges to a relaxation of Nash equilibrium, namely, coarse\ncorrelated equilibrium.\nDe\ufb01nition 1 (no regret). A player achieves no regret in a sequence of play a1, . . . , aT if his regret\nagainst any \ufb01xed strategy a(cid:48)\n\nlimT\u2192\u221e 1\nT\n\n(1)\nDe\ufb01nition 2 (coarse correlated equilibrium, CCE). A randomized action pro\ufb01le a \u2208 \u2206(A) is a\ncoarse correlated equilibrium of a complete information game with valuation pro\ufb01le v if for every\nplayer i and a(cid:48)\n\ni, at\u2212i; vi) \u2212 Ui(at; vi)) = 0.\n\ni \u2208 Ai:\n\n(cid:80)T\ni vanishes to zero:\nt=1(Ui(a(cid:48)\n\n(2)\nTheorem 3 (Blum et al. [2]). The empirical distribution of actions of any no-regret sequence in a\nrepeated game converges to the set of CCE of the static game.\n\ni, a\u2212i; vi)]\n\nEa [Ui(a; vi)] \u2265 Ea [Ui(a(cid:48)\n\nPrice of Anarchy of CCE. Roughgarden [11] gave a unifying framework for comparing the social\nwelfare, under various equilibrium notions including coarse correlated equilibrium, to the optimal\nsocial welfare by de\ufb01ning the notion of a smooth game. This framework was extended to games like\nauctions and allocation mechanisms by Syrgkanis and Tardos [12].\n\n4Bold-face symbols denote random variables.\n\n3\n\n\fSimultaneous First Price Auction with Submodular Bidders\n\nGame/Mechanism\n\nFirst Price Multi-Unit Auction\nFirst Price Position Auction\n\nAll-Pay Auction\n\nGreedy Combinatorial Auction with d-complements\n\nProportional Bandwitdth Allocation Mechanism\n\nSubmodular Welfare Games\n\nCongestion Games with Linear Delays\n\n(\u03bb, \u00b5)\n\n(1 \u2212 1/e, 1)\n(1 \u2212 1/e, 1)\n(1/2, 1)\n(1/2, 1)\n(1 \u2212 1/e, d)\n(1/4, 1)\n(1, 1)\n\n(5/3, 1/3)\n\nPOA Reference\ne\ne\u22121\ne\ne\u22121\n2\n2\nde\ne\u22121\n4\n2\n5/2\n\n[12]\n[5]\n[12]\n[12]\n[10]\n[12]\n\n[13, 11]\n\n[11]\n\nFigure 1: Examples of smooth games and mechanisms\n\nDe\ufb01nition 4 (smooth mechanism). A mechanism M is (\u03bb, \u00b5)-smooth for some \u03bb, \u00b5 \u2265 0 there exists\nan independent randomized action pro\ufb01le a\u2217(v) \u2208 \u2206(A1)\u00d7\u00b7\u00b7\u00b7\u00d7 \u2206(An) for each valuation pro\ufb01le\nv, such that for any action pro\ufb01le a \u2208 A and valuation pro\ufb01le v \u2208 V:\n\ni (v), a\u2212i; vi) \u2265 \u03bb \u00b7 OPT(v) \u2212 \u00b5 \u00b7 R(a).\n\n(3)\n\n(cid:80)\ni\u2208[n] Ui(a\u2217\n\nMany important games and mechanisms satisfy this smoothness de\ufb01nition for various parameters\nof \u03bb and \u00b5 (see Figure 1); the following theorem shows that the welfare of any coarse correlated\nequilibrium in any of these games is nearly optimal.\nTheorem 5 (ef\ufb01ciency of CCE; [12]). If a mechanism is (\u03bb, \u00b5)-smooth then the social welfare of\nany course correlated equilibrium at least\nmax{1,\u00b5} of the optimal welfare, i.e., the price of anarchy\nsatis\ufb01es POA \u2264 max{1,\u00b5}\n\n\u03bb\n\n.\n\n\u03bb\n\nPrice of Anarchy of No-regret Learning. Following Blum et al. [2], Theorem 3 and Theorem 5\nimply that no-regret learning dynamics have near-optimal social welfare.\nCorollary 6 (ef\ufb01ciency of no-regret dyhamics; [12]). If a mechanism is (\u03bb, \u00b5)-smooth then the\naverage welfare of any no-regret dynamics of the repeated game with a \ufb01xed player set and valuation\npro\ufb01le, achieves average social welfare at least\nmax{1,\u00b5} of the optimal welfare, i.e., the price of\nanarchy satis\ufb01es POA \u2264 max{1,\u00b5}\nImportantly, Corollary 6 holds the valuation pro\ufb01le v \u2208 V \ufb01xed throughout the repeated game play.\nThe main contribution of this paper is in extending this theory to games of incomplete information,\ne.g., where the values of the players are drawn at random in each round of game play.\n\n\u03bb\n\n\u03bb\n\n.\n\n3 Population Interpretation of Bayesian Games\n\ni\n\nIn the standard independent private value model of a Bayesian game there are n players. Player i\nhas type vi drawn uniformly from the set of type Vi (and this distribution is denoted Fi).5 We will\nrestrict attention to the case when the type space Vi is \ufb01nite. A player\u2019s strategy in this Bayesian\ngame is a mapping si : Vi \u2192 Ai from a valuation vi \u2208 Vi to an action ai \u2208 Ai. We will denote\nwith \u03a3i = AVi\nthe strategy space of each player and with \u03a3 = \u03a31 \u00d7 . . . \u00d7 \u03a3n. In the game, each\nplayer i realizes his type vi from the distribution and then makes action si(vi) in the game.\nIn the population interpretation of the Bayesian game, also called the agent normal form representa-\ntion [6], there are n \ufb01nite populations of players. Each player in population i has a type vi which we\nassume to be distinct for each player in each population and across populations.6 The set of players\nin the population is denoted Vi. and the player in population i with type vi is called player vi. In the\npopulation game, each player vi chooses an action si(vi). Nature uniformly draws one player from\n\n5The restriction to the uniform distribution is without loss of generality for any \ufb01nite type space and for any\n\ndistribution over the type space that involves only rational probabilities.\n\n6The restriction to distinct types is without of loss of generality as we can always augment a type space with\n\nan index that does not affect player utilities.\n\n4\n\n\feach population, and the game is played with those players\u2019 actions. In other words, the utility of\nplayer vi from population i is:\n\nU AG\ni,vi\n\n(s) = Ev [Ui(s(v); vi) \u00b7 1{vi = vi}]\n\n(4)\nNotice that the population interpretation of the Bayesian game is in fact a stochastic game of com-\nplete information.\nThere are multiple generalizations of coarse correlated equilibria from games of complete informa-\ntion to games of incomplete information (c.f. [6], [1], [4]). One of the canonical de\ufb01nitions is simply\nthe coarse correlated equilibrium of the stochastic game of complete information that is de\ufb01ned by\nthe population interpretation above.7\nDe\ufb01nition 7 (Bayesian coarse correlated equilibrium - BAYES-CCE). A randomized strategy pro\ufb01le\ns \u2208 \u2206(\u03a3) is a Bayesian coarse correlated equilibrium if for every a(cid:48)\ni \u2208 Ai and for every vi \u2208 Vi:\n(5)\n\nEsEv [Ui(s(v); vi) | vi = vi] \u2265 EsEv [Ui(a(cid:48)\n\ni, s\u2212i(v\u2212i); vi) | vi = vi]\n\nIn a game of incomplete information the welfare in equilibrium will be compared to the expected\nex-post optimal social welfare Ev[OPT(v)]. We will refer to the worst-case ratio of the expected\noptimal social welfare over the expected social welfare of any BAYES-CCE as BAYES-CCE-POA.\n\n4 Learning in Repeated Bayesian Game\n\nConsider a repeated version of the population interpretation of a Bayesian game. At each iteration\none player vi from each population is sampled uniformly and independently from other populations.\nThe set of chosen players then participate in an instance of a mechanism M. We assume that each\nplayer vi \u2208 Vi, uses some no-regret learning rule to play in this repeated game.8 In De\ufb01nition 8, we\ndescribe the structure of the game and our notation more elaborately.\nDe\ufb01nition 8. The repeated Bayesian game of M proceeds as follows. In stage t:\n\n1. Each player vi \u2208 Vi in each population i picks an action st\nthe function that maps a player vi \u2208 Vi to his action.\n\ni \u2208 A|Vi|\nst\n\ni\n\ni(vi) \u2208 Ai. We denote with\n\n2. From each population i one player vt\n\nn) be the chosen pro\ufb01le of players and st(vt) = (st\n\ni \u2208 Vi is selected uniformly at random. Let vt =\nn)) be the\n\n1), . . . , st\n\nn(vt\n\n1(vt\n\n1, . . . , vt\n\n(vt\npro\ufb01le of chosen actions.\n\n3. Each player vt\n\naction st\nexperience zero utility.\n\ni(vt\n\ni participates in an instance of game M, in the role of player i \u2208 [n], with\ni ). All players not selected in Step 2\n\ni ) and experiences a utility of Ui(st(vt); vt\n\nRemark. We point out that for each player in a population to achieve no-regret he does not need\nto know the distribution of values in other populations. There exist algorithms that can achieve the\nno-regret property and simply require an oracle that returns the utility of a player at each iteration.\nThus all we need to assume is that each player receives as feedback his utility at each iteration.\n\nRemark. We also note that our results would extend to the case where at each period multiple\nmatchings are sampled independently and players potentially participate in more than one instance\nof the mechanism M and potentially with different players from the remaining population. The only\nthing that the players need to observe in such a setting is their average utility that resulted from their\ni(vi) \u2208 Ai from all the instances that they participated at the given period. Such a scenario\naction st\nseems an appealing model in online ad auction marketplaces where players receive only average\nutility feedback from their bids.\n\n7This notion is the coarse analog of the agent normal form Bayes correlated equilibrium de\ufb01ned in Section\n\n4.2 of Forges [6].\n\n8An equivalent and standard way to view a Bayesian game is that each player draws his value independently\nfrom his distribution each time the game is played. In this interpretation the player plays by choosing a strategy\nthat maps his value to an action (or distribution over actions). In this interpretation our no-regret condition\nrequires that the player not regret his actions for each possible value.\n\n5\n\n\fBayesian Price of Anarchy for No-regret Learners.\nIn this repeated game setting we want to\ncompare the average social welfare of any sequence of play where each player uses a vanishing\nregret algorithm versus the average optimal welfare. Moreover, we want to quantify the worst-case\nsuch average welfare over all possible valuation distributions within each population:\n\n(6)\n\n(cid:80)T\n\n(cid:80)T\n\nt=1 OPT(vt)\n\nt=1 SW M(st(vt);vt)\n\nsup\n\nF1,...,Fn\n\nlim sup\nT\u2192\u221e\n\n\u03bb\n\nWe will refer to this quantity as the Bayesian price of anarchy for no-regret learners. The numerator\nof this term is simply the average optimal welfare when players from each population are drawn\nindependently in each stage; it converges almost surely to the expected ex-post optimal welfare\nEv[OPT(v)] of the stage game. Our main theorem is that if the mechanism is smooth and players\nfollow no-regret strategies then the expected welfare is guaranteed to be close to the optimal welfare.\nTheorem 9 (Main Theorem). If a mechanism is (\u03bb, \u00b5)-smooth then the average (over time) welfare\nof any no-regret dynamics of the repeated Bayesian game achieves average social welfare at least\nmax{1,\u00b5} of the average optimal welfare, i.e. POA \u2264 max{1,\u00b5}\nRoadmap of the proof.\nIn Section 5, we show that any vanishing regret sequence of play of the\nrepeated Bayesian game, will converge almost surely to the Bayesian version of a coarse correlated\nequilibrium of the incomplete information stage game. Therefore the Bayesian price of total anarchy\nwill be upper bounded by the ef\ufb01ciency of guarantee of any Bayesian coarse correlated equilibrium.\nFinally, in Section 6 we show that the price of anarchy bound of smooth mechanisms directly extends\nto Bayesian coarse correlated equilibria, thereby providing an upper bound on the Bayesian price of\ntotal anarchy of the repeated game.\n\n, almost surely.\n\n\u03bb\n\nRemark. We point out that our de\ufb01nition of BAYES-CCE is inherently different and more restricted\nthan the one de\ufb01ned in Caragiannis et al. [4]. There, a BAYES-CCE is de\ufb01ned as a joint distribution\nD over V \u00d7 A, such that if (v, a) \u223c D then for any vi \u2208 Vi and a(cid:48)\n\ni(vi) \u2208 Ai:\n\nE(v,a) [Ui(a; vi)] \u2265 E(v,a) [Ui(a(cid:48)\n\ni(vi), a\u2212i; vi)]\n\n(7)\nThe main difference is that the product distribution de\ufb01ned by a distribution in \u2206(\u03a3) and the dis-\ntribution of values, cannot produce any possible joint distribution over (V,A), but the type of joint\ndistributions are restricted to satisfy a conditional independence property described by [6]. Namely\nthat player i\u2019s action is conditionally independent of some other player j\u2019s value, given player i\u2019s\ntype. Such a conditional independence property is essential for the guarantees that we will present\nin this work to extend to a BAYES-CCE and hence do not seem to extend to the notion given in [4].\nHowever, as we will show in Section 5, the no-regret dynamics that we analyze, which are math-\nematically equivalent to the dynamics in [4], do converge to this smaller set of BAYES-CCE that\nwe de\ufb01ne and for which our ef\ufb01ciency guarantees will extend. This extra convergence property is\nnot needed when the mechanism satis\ufb01es the stronger semi-smoothness property de\ufb01ned in [4] and\nthereby was not needed to show ef\ufb01ciency bounds in their setting.\n\n5 Convergence of Bayesian No-Regret to BAYES-CCE\n\nIn this section we show that no-regret learning in the repeated Bayesian game converges almost\nsurely to the set of Bayesian coarse correlated equilibria. Any given sequence of play of the repeated\nBayesian game, which we de\ufb01ned in De\ufb01nition 8, gives rise to a sequence of strategy-value pairs\n, captures the actions that each player vi in population\n(st, vt) where st = (st\ni would have chosen, had they been picked. Then observe that all that matters to compute the average\nsocial welfare of the game for any given time step T , is the empirical distribution of pairs (s, v), up\ntill time step T , denoted as DT , i.e. if (sT , vT ) is a random sample from DT :\n\ni \u2208 AVi\n\nn) and st\n\n1, . . . , st\n\ni\n\n1\nT\n\nt=1 SW (st(vt); vt) = E(sT ,vT )\n\n(8)\nLemma 10 (Almost sure convergence to BAYES-CCE). Consider a sequence of play of the random\nmatching game, where each player uses a vanishing regret algorithm and let DT be the empirical\ndistribution of (strategy, valuation) pro\ufb01le pairs up till time step T . Consider any subsequence of\n{DT}T that converges in distribution to some distribution D. Then, almost surely, D is a product\ndistribution, i.e. D = Ds \u00d7 Dv, with Ds \u2208 \u2206(\u03a3) and Dv \u00d7 \u2206(V) such that Dv = F and\nDs \u2208 BAYES-CCE of the static incomplete information game with distributional beliefs F.\n\n(cid:2)SW (sT (vT ); vT )(cid:3)\n\n(cid:80)T\n\n6\n\n\f(cid:80)\n\nEs\n\n(cid:104) 1|Ts|\n(cid:104)(cid:80)\n\nw\u2208V\n\n(cid:105) \u2264 o(T )\n(cid:105) \u2264 o(T )\n\nT\n\nT\n\n(10)\n\n(11)\n\nProof. We will denote with\n\nri(a\u2217\n\ni , a; vi) = Ui(a\u2217\n\ni , a\u2212i; vi) \u2212 Ui(a; vi),\n\ni \u2208 \u03a3i:\n\nthe regret of player vi from population i, for action a\u2217\n1{vt\nthat for any s\u2217\n\ni(vi) =\ni = vi}. Since the sequence has vanishing regret for each player vi in population Pi, it must be\n\ni at action pro\ufb01le a. For a vi \u2208 Vi let xt\n\ni(vi) \u00b7 ri (s\u2217\n\ni (vi), st(vt); vi) \u2264 o(T )\n\nt=1 xt\n\n(9)\ns \u2208 \u2206(\u03a3) denote the empirical distribution of st and let s be a random sample\ns . For each s \u2208 \u03a3, let Ts \u2282 [T ] denote the time steps such that st = s for each t \u2208 Ts. Then\n\nFor any \ufb01xed T , let DT\nfrom DT\nwe can re-write Equation (9) as:\n\n(cid:80)T\n\ni(vi) \u00b7 ri (s\u2217\nxt\n\nt\u2208Ts\n\ni (vi), st(vt); vi)\n\nFor any s \u2208 \u03a3 and w \u2208 V, let Ts,w = {t \u2208 Ts : vt = w}. Then we can re-write Equation (10) as:\n\nEs\n\n|Ts,w|\n|Ts| 1{wi = vi} \u00b7 ri (s\u2217\n\ni (vi), s(w); vi)\n\nNow we observe that |Ts,w|\nis the empirical frequency of the valuation vector w \u2208 V, when \ufb01ltered\n|Ts|\nat time steps where the strategy vector was s. Since at each time step t the valuation vector vt is\npicked independently from the distribution of valuation pro\ufb01les F, this is the empirical frequency\nof Ts independent samples from F.\nBy standard arguments from empirical processes theory, if Ts \u2192 \u221e then this empirical distribution\nconverges almost surely to the distribution F. On the other hand if Ts doesn\u2019t go to \u221e, then the\nempirical frequency of strategy s vanishes to 0 as T \u2192 \u221e and therefore has measure zero in the\nabove expectation as T \u2192 \u221e. Thus for any convergent subsequence of {DT}, if D is the limit\ndistribution, then if s is in the support of D, then almost surely the distribution of w conditional on\nstrategy s is F. Thus we can write D as a product distribution Ds \u00d7 F.\nMoreover, if we denote with w the random variable that follows distribution F, then the limit of\nEquation (11) for any convergent sub-sequence, will give that:\n\na.s.: Es\u223cDs\n\nEw\u223cF [1{wi = vi} \u00b7 ri (s\u2217\n\ni (vi), s(w); vi)] \u2264 0\n\nEquivalently, we get that Ds will satisfy that for all vi \u2208 Vi and for all s\u2217\ni :\n\na.s.: Es\u223cDs\n\nEw\u223cF [ri (s\u2217\n\ni (wi), s(w); wi) | wi = vi] \u2264 0\n\nThe latter is exactly the BAYES-CCE condition from De\ufb01nition 7. Thus Ds is in the set of\nBAYES-CCE of the static incomplete incomplete information game among n players, where the\ntype pro\ufb01le is drawn from F.\n\nGiven the latter convergence theorem we can easily conclude the following the following theorem,\nwhose proof is given in the supplementary material.\nTheorem 11. The price of anarchy for Bayesian no-regret dynamics is upper bounded by the price\nof anarchy of Bayesian coarse correlated equilibria, almost surely.\n\n6 Ef\ufb01ciency of Smooth Mechanisms at Bayes Coarse Correlated Equilibria\nIn this section we show that smoothness of a mechanism M implies that any BAYES-CCE of the\nincomplete information setting achieves at least\nmax{1,\u00b5} of the expected optimal welfare. To show\nthis we will adopt the interpretation of BAYES-CCE that we used in the previous section, as coarse\ncorrelated equilibria of a more complex normal form game; the stochastic agent normal form rep-\nresentation of the Bayesian game. We can interpret this complex normal form game as the game\ni |Vi| players, which randomly\nsamples one player from each of the n population and where the utility of a player in the complete\ninformation mechanism MAG is given by Equation (4). The set of possible outcomes in this agent\n\nthat arises from a complete information mechanism MAG among(cid:80)\n\n\u03bb\n\n7\n\n\f\u03bb\n\n\u03bb\n\ngame corresponds to the set of mappings from a pro\ufb01le of chosen players to an outcome in the un-\nderlying mechanism M. The optimal welfare of this game, is then the expected ex-post optimal\nwelfare OPTAG = Ev [OPT(v)].\nThe main theorem that we will show is that whenever mechanism M is (\u03bb, \u00b5)-smooth, then also\nmechanism MAG is (\u03bb, \u00b5)-smooth. Then we will invoke a theorem of [12, 11], which shows that\nany coarse correlated equilibrium of a complete information mechanism achieves at least\nmax{1,\u00b5}\nof the optimal welfare. By the equivalence between BAYES-CCE and CCE of this complete infor-\nmation game, we get that every BAYES-CCE of the Bayesian game achieves at least\nmax{1,\u00b5} of the\nexpected optimal welfare.\nTheorem 12 (From complete information to Bayesian smoothness). If a mechanism M is (\u03bb, \u00b5)-\nsmooth, then for any vector of independent valuation distributions F = (F1, . . . ,Fn), the complete\ninformation mechanism MAG is also (\u03bb, \u00b5)-smooth.\nProof. Consider the following randomized deviation for each player vi \u2208 Vi in population i: He\nrandom samples a valuation pro\ufb01le w \u223c F. Then he plays according to the randomized action\ns\u2217\ni (vi, w\u2212i), i.e., the player deviates using the randomized action guaranteed by the smoothness\nproperty of mechanism M for his type vi and the random sample of the types of the others w\u2212i.\nIn this\nConsider an arbitrary action pro\ufb01le s = (s1, . . . , sn) for all players in all populations.\ncontext it is better to think of each si as a |Vi| dimensional vector in A|Vi|\ni |Vi|\ndimensional vector. Then with s\u2212vi we will denote all the components of this large vector except\nthe ones corresponding to player vi \u2208 Vi. Moreover, we will be denoting with v a sample from F\ndrawn by mechanism MAG. We now argue about the expected utility of player vi from this deviation,\nwhich is:\nEw\n\nand to view s as a(cid:80)\n\ni (vi, w\u2212i), s\u2212i(v\u2212i); vi) \u00b7 1{vi = vi}]\n\ni (vi, w\u2212i), s\u2212vi)(cid:3) = EwEv [Ui(s\u2217\n(cid:2)(cid:80)\ni (vi, w\u2212i), s\u2212vi)(cid:3) = Ew,v\n\nSumming the latter over all players vi \u2208 Vi in population i:\nUi(s\u2217\n\n(cid:2)U AG\n(cid:2)U AG\n\ni (vi, w\u2212i), s\u2212i(v\u2212i); vi) \u00b7 1{vi = vi}(cid:3)\n\n(cid:88)\n\nEw\n\n(s\u2217\n\n(s\u2217\n\ni,vi\n\ni\n\ni,vi\n\nvi\u2208Vi\n\nvi\u2208Vi\n\n= Ev,w [Ui(s\u2217\n= Ev,w [Ui(s\u2217\n= Ev,w [Ui(s\u2217\n\ni (vi, w\u2212i), s\u2212i(v\u2212i); vi)]\ni (wi, w\u2212i), s\u2212i(v\u2212i); wi)]\ni (w), s\u2212i(v\u2212i); wi)] ,\n\nwhere the second to last equation is an exchange of variable names and regrouping using indepen-\ndence. Summing over populations and using smoothness of M, we get smoothness of MAG:\ni (w), s\u2212i(v\u2212i); wi)\n\ni (vi, w\u2212i), s\u2212vi)(cid:3) = Ev,w\n\n(cid:2)U AG\n\ni\u2208[n] Ui(s\u2217\n\nEw\n\n(s\u2217\n\ni,vi\n\n(cid:88)\n\n(cid:88)\n\n(cid:104)(cid:80)\n\n(cid:105)\n\ni\u2208[n]\n\nvi\u2208Vi\n\u2265 Ev,w [\u03bbOPT(w) \u2212 \u00b5R(s(v))] = \u03bbEw [OPT(w)] \u2212 \u00b5RAG(s)\n\nCorollary 13. Every BAYES-CCE of the incomplete information setting of a smooth mechanism\nM, achieves expected welfare at least\n\nmax{1,\u00b5} of the expected optimal welfare.\n\n\u03bb\n\n7 Finite Time Analysis and Convergence Rates\n\nIn the previous section we argued about the limit average ef\ufb01ciency of the game as time goes to\nin\ufb01nity. In this section we analyze the convergence rate to BAYES-CCE and we show approximate\nef\ufb01ciency results even for \ufb01nite time, when players are allowed to have some \u0001-regret.\nTheorem 14. Consider the repeated matching game with a (\u03bb, \u00b5)-smooth mechanism. Suppose that\nfor any T \u2265 T 0, each player in each of the n populations has regret at most \u0001\nn . Then for every \u03b4\nand \u03c1, there exists a T \u2217(\u03b4, \u03c1), such that for any T \u2265 min{T 0, T \u2217}, with probability 1 \u2212 \u03c1:\n\n(cid:80)T\n(cid:17)\nt=1 SW (st(vt); vt) \u2265\nmax{1,\u00b5}Ev [OPT(v)] \u2212 \u03b4 \u2212 \u00b5 \u00b7 \u0001\n.\n\n(cid:16) 2\n\nlog\n\n\u03bb\n\n1\nT\n\n(12)\n\nMoreover, T \u2217(\u03b4, \u03c1) \u2264 54\u00b7n3\u00b7|\u03a3|\u00b7|V|2\u00b7H 3\n\n\u03b43\n\n\u03c1\n\n8\n\n\fReferences\n[1] Dirk Bergemann and Stephen Morris. Correlated Equilibrium in Games with Incomplete In-\nformation. Cowles Foundation Discussion Papers 1822, Cowles Foundation for Research in\nEconomics, Yale University, October 2011.\n\n[2] Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regret minimiza-\ntion and the price of total anarchy. In Proceedings of the Fortieth Annual ACM Symposium on\nTheory of Computing, STOC \u201908, pages 373\u2013382, New York, NY, USA, 2008. ACM.\n\n[3] Yang Cai and Christos Papadimitriou. Simultaneous bayesian auctions and computational\ncomplexity. In Proceedings of the \ufb01fteenth ACM conference on Economics and Computation,\nEC \u201914, pages 895\u2013910, New York, NY, USA, 2014. ACM.\n\n[4] Ioannis Caragiannis, Christos Kaklamanis, Panagiotis Kanellopoulos, Maria Kyropoulou,\nBrendan Lucier, Renato Paes Leme, and \u00b4Eva Tardos. Bounding the inef\ufb01ciency of outcomes\nin generalized second price auctions. Journal of Economic Theory, (0):\u2013, 2014.\n\n[5] Bart de Keijzer, Evangelos Markakis, Guido Schfer, and Orestis Telelis. Inef\ufb01ciency of stan-\ndard multi-unit auctions. In HansL. Bodlaender and GiuseppeF. Italiano, editors, Algorithms\nESA 2013, volume 8125 of Lecture Notes in Computer Science, pages 385\u2013396. Springer\nBerlin Heidelberg, 2013.\n\n[6] Franoise Forges. Five legitimate de\ufb01nitions of correlated equilibrium in games with incomplete\n\ninformation. Theory and Decision, 35(3):277\u2013310, 1993.\n\n[7] Dean P Foster and Rakesh V Vohra. Asymptotic calibration. Biometrika, 85(2):379\u2013390, 1998.\n[8] ToddR. Kaplan and Shmuel Zamir. Asymmetric \ufb01rst-price auctions with uniform distributions:\n\nanalytic solutions to the general case. Economic Theory, 50(2):269\u2013302, 2012.\n\n[9] Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. In Proceedings of the\n16th annual conference on Theoretical aspects of computer science, STACS\u201999, pages 404\u2013\n413, Berlin, Heidelberg, 1999. Springer-Verlag.\n\n[10] B. Lucier and A. Borodin. Price of anarchy for greedy auctions. In Proceedings of the Twenty-\nFirst Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201910, pages 537\u2013553,\nPhiladelphia, PA, USA, 2010. Society for Industrial and Applied Mathematics.\n\n[11] T. Roughgarden. Intrinsic robustness of the price of anarchy. In Proceedings of the 41st annual\nACM symposium on Theory of computing, STOC \u201909, pages 513\u2013522, New York, NY, USA,\n2009. ACM.\n\n[12] Vasilis Syrgkanis and \u00b4Eva Tardos. Composable and ef\ufb01cient mechanisms. In Proceedings of\nthe Forty-\ufb01fth Annual ACM Symposium on Theory of Computing, STOC \u201913, pages 211\u2013220,\nNew York, NY, USA, 2013. ACM.\n\n[13] A. Vetta. Nash equilibria in competitive societies, with applications to facility location, traf-\n\ufb01c routing and auctions. In Foundations of Computer Science, 2002. Proceedings. The 43rd\nAnnual IEEE Symposium on, pages 416\u2013425, 2002.\n\n9\n\n\f", "award": [], "sourceid": 1725, "authors": [{"given_name": "Jason", "family_name": "Hartline", "institution": "Northwestern University"}, {"given_name": "Vasilis", "family_name": "Syrgkanis", "institution": "Microsoft Research"}, {"given_name": "Eva", "family_name": "Tardos", "institution": "Cornell University"}]}