{"title": "Countering Feedback Delays in Multi-Agent Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 6171, "page_last": 6181, "abstract": "We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call \u03bb-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.", "full_text": "Countering Feedback Delays in Multi-Agent Learning\n\nZhengyuan Zhou\nStanford University\n\nzyzhou@stanford.edu\n\nPanayotis Mertikopoulos\n\nUniv. Grenoble Alpes, CNRS, Inria, LIG\npanayotis.mertikopoulos@imag.fr\n\nNicholas Bambos\nStanford University\n\nPeter Glynn\n\nStanford University\n\nClaire Tomlin\nUC Berkeley\n\nbambos@stanford.edu\n\nglynn@stanford.edu\n\ntomlin@eecs.berkeley.edu\n\nAbstract\n\nWe consider a model of game-theoretic learning based on online mirror de-\nscent (OMD) with asynchronous and delayed feedback information. Instead of\nfocusing on speci\ufb01c games, we consider a broad class of continuous games de\ufb01ned\nby the general equilibrium stability notion, which we call \u03bb-variational stabil-\nity. Our \ufb01rst contribution is that, in this class of games, the actual sequence of\nplay induced by OMD-based learning converges to Nash equilibria provided that\nthe feedback delays faced by the players are synchronous and bounded. Subse-\nquently, to tackle fully decentralized, asynchronous environments with (possibly)\nunbounded delays between actions and feedback, we propose a variant of OMD\nwhich we call delayed mirror descent (DMD), and which relies on the repeated\nleveraging of past information. With this modi\ufb01cation, the algorithm converges to\nNash equilibria with no feedback synchronicity assumptions and even when the\ndelays grow superlinearly relative to the horizon of play.\n\n1\n\nIntroduction\n\nOnline learning is a broad and powerful theoretical framework enjoying widespread applications and\ngreat success in machine learning, data science, operations research, and many other \ufb01elds [3, 7, 22].\nThe prototypical online learning problem may be described as follows: At each round t = 0, 1, . . . , a\nplayer selects an action xt from some convex, compact set, and obtains a reward ut(xt) based on\nsome a priori unknown payoff function ut. Subsequently, the player receives some feedback (e.g. the\npast history of the reward functions) and selects a new action xt+1 with the goal of maximizing the\nobtained reward. Aggregating over the rounds of the process, this is usually quanti\ufb01ed by asking that\nt=1 [ut(x) \u2212 ut(xt)] grow sublinearly with the\n\nthe player\u2019s (external) regret Reg(T ) \u2261 maxx\u2208X(cid:80)T\n\nhorizon of play T , a property known as \u201cno regret\u201d.\nOne of the most widely used algorithmic schemes for learning in this context is the online mirror\ndescent (OMD) class of algorithms [23]. Tracing its origins to [17] for of\ufb02ine optimization problems,\nOMD proceeds by taking a gradient step in the dual (gradient) space and projecting it back to the\nprimal (decision) space via a mirror map generated by a strongly convex regularizer function (with\ndifferent regularizers giving rise to different algorithms). In particular, OMD includes as special cases\nseveral seminal learning algorithms, such as Zinkevich\u2019s online gradient descent (OGD) scheme\n[29], and the multiplicative/exponential weights (EW) algorithm [1, 13]. Several variants of this\nclass also exist and, perhaps unsurprisingly, they occur with a variety of different names \u2013 such as\n\u201cFollow-the-Regularized-Leader\" [9], dual averaging [18, 25], and so on.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f\u221a\n\nWhen ut is concave, OMD enjoys a sublinear O(\nT ) regret bound which is known to be universally\ntight.1 A common instantiation of this is found in repeated multi-player games, where each player\u2019s\npayoff function is determined by the actions of all other players via a \ufb01xed mechanism \u2013 the stage\ngame. Even though this mechanism may be unknown to the players, the universality of the OMD\nregret bounds raises high expectations in terms of performance guarantees, so it is natural to assume\nthat players adopt some variant thereof when faced with such online decision processes. This leads to\nthe following central question: if all players of a repeated game employ an OMD updating rule, do\ntheir actions converge to a Nash equilibrium of the underlying one-shot game?\n\nestablished the convergence of the so-called \u201cergodic average\u201d T \u22121(cid:80)T\n\nRelated Work. Given the prominence of Nash equilibrium as a solution concept in game theory\n(compared to coarser notions such as correlated equilibria or the Hannan set), this problem lies at\nthe heart of multi-agent learning [4]. However, convergence to a Nash equilibrium is, in the words\nof [4], \u201cconsiderably more dif\ufb01cult\u201d than attaining a no-regret state for all players (which leads to\nweaker notion of coarse correlated equilibrium in \ufb01nite games). To study this question, a growing\nbody of literature has focused on special classes of games (e.g. zero-sum games, routing games) and\nt=1 xt of OMD [2, 10, 12].\nIn general, the actual sequence of play may fail to converge altogether, even in simple, \ufb01nite games\n[16, 24]. On the other hand, there is a number of recent works establishing the convergence of play in\npotential games with \ufb01nite action sets under different assumptions for the number of players involved\n(continuous or \ufb01nite) and the quality of the available feedback (perfect, semi-bandit/imperfect, or\nbandit/payoff-based) [5, 11, 14, 19]. However, these works focus on games with \ufb01nite action sets and\nfeedback is assumed to be instantly available to the players (i.e. with no delays or asynchronicities),\ntwo crucial assumptions that we do not make in this paper.\nA further major challenge arises in decentralized environments (such as transportation networks),\nwhere a considerable delay often occurs between a player\u2019s action and the corresponding received\nfeedback. To study learning in such settings, [20] recently introduced an elegant and \ufb02exible delay\nframework where the gradient at round t is only available at round t + dt \u2212 1, with dt being the\ndelay associated with the player\u2019s action at round t.2 [20] then considered a very natural extension of\nOMD under delays: updating the set of gradients as they are received (see Algorithm 1 for details).\nt=1 dt, [20] showed that OMD enjoys an O(D(T )1/2)\nregret bound. This natural extension has several strengths: \ufb01rst, no assumption is made on how the\ngradients are received (the delayed gradients can be received out-of-order); further, as pointed out\nin [6, 8], a gradient \u201cdoes not need to be timestamped by the round s from which it originates,\u201d as\nrequired for example by the pooling strategies of [6, 8].\n\nIf the total delay after time T is D(T ) =(cid:80)T\n\nOur Contributions. Our investigations here differ from existing work in the following aspects:\nFirst, we consider learning in games with asynchronous and delayed feedback by extending the\ngeneral single-agent feedback delay framework introduced in [20]. Previous work on the topic has\nfocused on the regret analysis of single-agent learning with delays, but the convergence properties\nof such processes in continuous games are completely unknown. Second, we focus throughout\non the convergence of the actual sequence of play generated by OMD (its \u201clast iterate\u201d in the\nparlance of optimization), as opposed to the algorithm\u2019s ergodic average 1\nt=1 xt. This last point\nT\nis worth emphasizing for several reasons: a) this mode of convergence is stronger and theoretically\nmore appealing because it implies ergodic convergence; b) in a game-theoretic setting, payoffs\nare determined by the actual sequence of play, so ergodic convergence diminishes in value if it\nis not accompanied by similar conclusions for the players\u2019 realized actions; and c) because there\nis no inherent averaging, the techniques used to prove convergence of xt provide a much \ufb01ner\nunderstanding of the evolution of OMD.\nThe starting point of our paper is the introduction of an equilibrium stability notion which we\ncall \u03bb-variational stability, a notion that is motivated by the concept of evolutionary stability in\npopulation games and builds on the characterization of stable Nash equilibria as solutions to a Minty-\ntype variational inequality [15]. This stability notion is intimately related to monotone operators in\nvariational analysis [21] and can be seen as a strict generalization of operator monotonicity in the\n\n(cid:80)T\n\n1In many formulations, a cost function (as opposed to a reward function) is used, in which case such cost\n\nfunctions need to be convex.\n\n2Of course, taking dt = 1 yields the classical no-delay setting.\n\n2\n\n\fcurrent game-theoretic context.3 By means of this notion, we are able to treat convergence questions\nin general games with continuous action spaces, without having to focus on a speci\ufb01c class of games\n\u2013 such as concave potential or strictly monotone games (though our analysis also covers such games).\nOur \ufb01rst result is that, assuming variational stability, the sequence of play induced by OMD converges\nto the game\u2019s set of Nash equilibria, provided that the delays of all players are synchronous and\nbounded (see Theorems 4.3 and 4.4). As an inherited bene\ufb01t, players adopting this learning algorithm\ncan receive gradients out-of-order and do not need to keep track of the timestamps from which the\ngradients originate. In fact, even in the special case of learning without delays, we are not aware of a\nsimilar convergence result for the actual sequence of play.\nAn important limitation of this result is that delays are assumed synchronous and bounded, an\nassumption which might not hold in large, decentralized environments. To lift this barrier, we\nintroduce a modi\ufb01cation of vanilla OMD which we call delayed mirror descent (DMD), and which\nleverages past information repeatedly, even in rounds where players receive no feedback. Thanks\nto this modi\ufb01cation, play under DMD converges to variationally stable sets of Nash equilibria\n(Theorem 5.2), even if the players experience asynchronous and unbounded delays: in particular,\ndelays could grow superlinearly in the game\u2019s horizon, and DMD would still converge.\nWe mention that the convergence proofs for both OMD and DMD rely on designing a particular\nLyapunov function, the so-called \u03bb-Fenchel coupling which serves as a \u201cprimal-dual divergence\u201d\nmeasure between actions and gradient variables. Thanks to its Lyapunov properties, the \u03bb-Fenchel\ncoupling provides a potent tool for proving convergence and we exploit it throughout. Further, we\npresent a uni\ufb01ed theoretical framework that puts the analysis of both algorithms under different delay\nassumptions on the same footing.\n\n2 Problem Setup\n\n2.1 Games with Continuous Action Sets\n\nDe\ufb01nition 2.1. A continuous game G is a tuple G = (N ,X =(cid:81)N\n\nWe start with the de\ufb01nition of a game with continuous action sets, which serves as a stage game and\nprovides a reward function for each player in an online learning process.\ni=1), where N is the\nset of N players {1, 2, . . . , N}, Xi is a compact convex subset of some \ufb01nite-dimensional vector\nspace Rdi representing the action space of player i, and ui : X \u2192 R is the i-th player\u2019s payoff\nfunction.\nRegarding the players\u2019 payoff functions, we make the following assumptions throughout:\n\ni=1 Xi,{ui}N\n\n1. For each i \u2208 N , ui(x) is continuous in x.\n2. For each i \u2208 N , ui is continuously differentiable in xi and the partial gradient \u2207xi ui(x) is\n\nLipschitz continuous in x.\n\nThroughout the paper, x\u2212i denotes the joint action of all players but player i. Consequently, the joint\naction4 x will frequently be written as (xi, x\u2212i). Two important quantities in the current context are:\nDe\ufb01nition 2.2. We let v(x) be the pro\ufb01le of the players\u2019 individual payoff gradients,5 i.e. v(x) =\n(v1(x), . . . , vN (x)), where vi(x) (cid:44) \u2207xi ui(x).\nDe\ufb01nition 2.3. Given a continuous game G, x\u2217 \u2208 X is called a (pure-strategy) Nash equilibrium if\nfor each i \u2208 N , ui(x\u2217\n\ni , x\u2217\n\n\u2212i),\u2200xi \u2208 Xi.\n2.2 Online Mirror Descent in Games under Delays\n\n\u2212i) \u2265 ui(xi, x\u2217\n\nIn what follows, we consider a general multi-agent delay model extending the single-agent delay\nmodel of [20] to the multi-agent learning case. At a high level, for each agent there can be an arbitrary\n\n3In the supplement, we give two well-known classes of games that satisfy this equilibrium notion.\n4Note that boldfaced letters are only used to denote joint actions. In particular, xi is a vector even though it\n\nis not boldfaced.\nalways exists and is a continuous function on the joint action space X .\n\n5Note that per the last assumption in the de\ufb01nition of a concave game (De\ufb01nition 2.1), the gradient v(x)\n\n3\n\n\fdelay between the stage at which an action is played and the stage at which feedback is received about\nsaid action (typically in the form of gradient information). There is no extra assumption imposed on\nthe feedback delays \u2013 in particular, feedback can arrive out-of-order and in a completely asynchronous\nmanner across agents. Further, the received feedback is not time-stamped \u2013 so the player might not\nknow to which iteration a speci\ufb01c piece of feedback corresponds.\nWhen OMD is applied in this setting, we obtain the following scheme:\n\ni + \u03b1t(cid:80)\n\ni , xi(cid:105) \u2212 hi(xi)}\nvi(xs)\n\ni = arg maxxi\u2208Xi{(cid:104)yt\nxt\nyt+1\ni = yt\ns\u2208Gt\nend for\n\nAlgorithm 1 Online Mirror Descent on Games under Delays\n1: Each player i chooses an initial y0\ni .\n2: for t = 0, 1, 2, . . . do\nfor i = 1, . . . , N do\n3:\n4:\n5:\n6:\n7: end for\nThree comments are in order here. First, each hi is a regularizer on Xi, as de\ufb01ned below:\nDe\ufb01nition 2.4. Let D be a compact and convex subset of Rm. We say that g : D \u2192 R is a regularizer\nif g is continuous and strongly convex on D, i.e. there exists some K > 0 such that\nKt(1 \u2212 t)(cid:107)d(cid:48) \u2212 d(cid:107)2\n\ng(td + (1 \u2212 t)d(cid:48)) \u2264 tg(d) + (1 \u2212 t)g(d(cid:48)) \u2212 1\n2\n\n(2.1)\n\ni\n\n(cid:48) \u2208 D.\n\nfor all t \u2208 [0, 1], bd, bd\nSecond, the gradient step size \u03b1t in Algorithm 1 can be any positive and non-increasing sequence\n\nthat satis\ufb01es the standard summability assumption:(cid:80)\u221e\n\nt=0 \u03b1t = \u221e,(cid:80)\u221e\n\nt=0(\u03b1t)2 < \u221e.\n\nThird, regarding the delay model: in Algorithm 1, Gt\ni denotes the set of rounds whose gradients\nbecome available for player i at the current round t. Denote player i\u2019s delay of the gradient at round\ni \u2212 1, i.e.\ns to be ds\ns \u2208 Gs+ds\ni = 1 for all s, player i doesn\u2019t experience any feedback delays. Note\nhere again that each player can receive feedback out of order: this can happen if the gradient at an\nearlier round has a much larger delay than that of the gradient at a later round.\n\ni (a positive integer), then this gradient vi(xs) will be available at round s + ds\ni\u22121\n\n. In particular, if ds\n\ni\n\n3 \u03bb-Variational Stability: A Key Criterion\n\nIn this section, we de\ufb01ne a key stability notion, called \u03bb-variational stability. This notion allows us to\nobtain strong convergence results for the induced sequence of play, as opposed to results that only hold\nin speci\ufb01c classes of games. The supplement provides two detailed special classes of games (convex\npotential games and asymmetric Cournot oligopolies) that admit variationally stable equilibria. Other\nexamples include monotone games (discussed later in this section), pseudo-monotone games [28],\nnon-atomic routing games [26, 27], symmetric in\ufb02uence network games [11] and many others.\n\n3.1 \u03bb-Variational Stability\n\nDe\ufb01nition 3.1. Given a game with continuous actions (N ,X =(cid:81)N\n\ncalled \u03bb-variationally stable for some \u03bb \u2208 RN\n\n++ if\n\ni=1 Xi,{ui}N\n\ni=1), a set C \u2282 X is\n\nN(cid:88)\n\n\u03bbi(cid:104)vi(x), xi \u2212 x\u2217\n\ni (cid:105) \u2264 0,\n\nfor all x \u2208 X , x\u2217 \u2208 C.\n\n(3.1)\n\ni=1\n\nwith equality if and only if x \u2208 C.\nRemark 3.1. If C is \u03bb-stable with \u03bbi = 1 for all i, it is called simply stable [15].\nWe emphasize that in a game setting, \u03bb-variational stability is more general than an important\nconcept called operator monotonicity in variational analysis. Speci\ufb01cally, v(\u00b7) is called a monotone\n\n4\n\n\foperator [21] if the following holds (with equality if and only if x = \u02dcx):\n\n(cid:104)v(x) \u2212 v(\u02dcx), x \u2212 \u02dcx(cid:105) (cid:44) N(cid:88)\n\n(cid:104)vi(x) \u2212 vi(\u02dcx), xi \u2212 \u02dcxi(cid:105) \u2264 0,\u2200x, \u02dcx \u2208 X .\n\n(3.2)\n\ni=1\n\nIf v(\u00b7) is monotone, the game admits a unique Nash equilibrium x\u2217 which (per the property of a Nash\nequilibrium) satis\ufb01es (cid:104)v(x\u2217), x \u2212 x\u2217(cid:105) \u2264 0. Consequently, if v(\u00b7) is a monotone operator, it follows\nthat (cid:104)v(x), x \u2212 x\u2217(cid:105) \u2264 (cid:104)v(x\u2217), x \u2212 x\u2217(cid:105) \u2264 0, where equality is achieved if and only if x = x\u2217. This\nimplies that when v(x) is a monotone operator, the singleton set of the unique Nash equilibrium is\n1-variationally stable, where 1 is the all-ones vector. The converse is not true: when v(x) is not a\nmonotone operator, we can still have a unique Nash equilibrium that is \u03bb-variationally stable, or more\ngenerally, have a \u03bb-variationally stable set C.\n\n3.2 Properties of \u03bb-Variational Stability\nLemma 3.2. If C is nonempty and \u03bb-stable, then it is closed, convex and contains all Nash equilibria\nof the game.\n\nThe following lemma gives us a convenient suf\ufb01cient condition ensuring that a singleton \u03bb-\nvariationally stable set {x\u2217} exists; in this case, we simply say that x\u2217 is \u03bb-variationally stable.\n\nLemma 3.3. Given a game with continuous actions (N ,X =(cid:81)N\n\ni=1), where each ui is\ntwice continuously differentiable. For each x \u2208 X , de\ufb01ne the \u03bb-weighted Hessian matrix H \u03bb(x) as\nfollows:\n\ni=1 Xi,{ui}N\n\n\u03bbi \u2207xj vi(x) +\n\n\u03bbj(\u2207xi vj(x))T .\n\nH \u03bb\n\nij(x) =\n\n(3.3)\nIf H \u03bb(x) is negative-de\ufb01nite for every x \u2208 X , then the game admits a unique Nash equilibrium x\u2217\nthat is \u03bb-globally variational stable.\nRemark 3.2. It is important to note that the Hessian matrix so de\ufb01ned is a block matrix: each H \u03bb\nis a di\u00d7dj matrix. Writing it in terms of the utility function, we have H \u03bb\n2 \u03bbj(\u2207xi \u2207xj uj(x))T .\n\nij(x)\n2 \u03bbi \u2207xj \u2207xi ui(x)+\n\nij(x) = 1\n\n1\n2\n\n1\n2\n\n1\n\n4 Convergence under Synchronous and Bounded Delays\n\nIn this section, we tackle the convergence of the last iterate of OMD under delays. We start by\nde\ufb01ning an important divergence measure, \u03bb-Fenchel coupling, that generalizes Bregman divergence.\nWe then establish its useful properties that play an indispensable role in both this and next sections.\n\n4.1 \u03bb-Fenchel Coupling\n\nDe\ufb01nition 4.1. Fix a game with continuous action spaces (N ,X =(cid:81)N\n\ni=1) and for each\nplayer i, let hi : Xi \u2192 R be a regularizer with respect to the norm (cid:107) \u00b7 (cid:107)i that is Ki-strongly convex.\n\ni=1 Xi,{ui}N\n\n1. The convex conjugate function h\u2217\n\ni : Rdi \u2192 R of hi is de\ufb01ned as:\n{(cid:104)xi, yi(cid:105) \u2212 hi(xi)}.\n\nh\u2217\ni (yi) = max\nxi\u2208Xi\n\n2. The choice function Ci : Rdi \u2192 Xi associated with regularizer hi for player i is de\ufb01ned as:\n\nN(cid:88)\n\ni=1\n\n5\n\nCi(yi) = arg max\nxi\u2208Xi\n\n++, the \u03bb-Fenchel coupling F \u03bb : X \u00d7 R(cid:80)N\n\n3. For a \u03bb \u2208 RN\n\n{(cid:104)xi, yi(cid:105) \u2212 hi(xi)}.\n\ni=1 di \u2192 R is de\ufb01ned as:\n\nF \u03bb(x, y) =\n\n\u03bbi(hi(xi) \u2212 (cid:104)xi, yi(cid:105) + h\u2217\n\ni (yi)).\n\n\fNote that although the domain of hi is Xi \u2282 Rdi, the domain of its conjugate (gradient space) h\u2217\nis Rdi. The two key properties of \u03bb-Fenchel coupling that will be important in establishing the\nconvergence of OMD are given next.\nLemma 4.2. For each i \u2208 {1, . . . , N}, let hi : Xi \u2192 R be a regularizer with respect to the norm\n(cid:107) \u00b7 (cid:107)i that is Ki-strongly convex and let \u03bb \u2208 RN\n\ni=1 di:\n\ni\n\n(cid:80)N\ni=1 Ki\u03bbi(cid:107)Ci(yi) \u2212 xi(cid:107)2\n\n++. Then \u2200x \u2208 X ,\u2200\u02dcy, y \u2208 R(cid:80)N\n)(cid:80)N\ni \u2265 1\n\n2 (mini Ki\u03bbi)(cid:80)N\n\n2 (maxi\n\n\u03bbi\nKi\n\ni=1 \u03bbi(cid:104) \u02dcyi \u2212 yi, Ci(yi)\u2212 xi(cid:105) + 1\n\n1. F \u03bb(x, y) \u2265 1\n\n2. F \u03bb(x, \u02dcy) \u2264 F \u03bb(x, y) +(cid:80)N\n\n2\n\ni is the dual norm of (cid:107) \u00b7 (cid:107)i (i.e. (cid:107)yi(cid:107)\u2217\n\nwhere (cid:107) \u00b7 (cid:107)\u2217\n(cid:80)N\nRemark 4.1. Collecting each individual choice map into a vector, we obtain the aggregate choice\ni=1 di \u2192 X , with C(y) = (C1(y1), . . . , CN (yN )). Since each space Xi is endowed\ni=1 (cid:107)xi(cid:107)i. We can also similarly de\ufb01ne the aggregate dual norm: (cid:107)y(cid:107)\u2217 =(cid:80)N\n(cid:107)x(cid:107) =(cid:80)N\nmap C : R\nwith norm (cid:107) \u00b7 (cid:107)i, we can de\ufb01ne the induced aggregate norm (cid:107) \u00b7 (cid:107) on the joint space X as follows:\ni=1 (cid:107)yi(cid:107)\u2217\ni .\nHenceforth, it shall be clear that the convergence in the joint space (e.g. C(yt) \u2192 x, yt \u2192 y) will\nbe de\ufb01ned under the respective aggregate norm.\n\ni = max(cid:107)xi(cid:107)i\u22641(cid:104)xi, yi(cid:105).\n\ni=1 (cid:107)Ci(yi) \u2212 xi(cid:107)2\ni .\n\ni=1((cid:107) \u02dcyi \u2212 yi(cid:107)\u2217\n\ni )2,\n\nFinally, we assume throughout the paper that the choice maps are regular in the following (very weak)\nsense: a choice map C(\u00b7) is said to be \u03bb-Fenchel coupling conforming if\nC(yt) \u2192 x implies F \u03bb(x, yt) \u2192 0 as t \u2192 \u221e.\n\n(4.1)\nUnless one aims for relatively pathological cases, choice maps induced by typical regularizers are\nalways \u03bb-Fenchel coupling conforming: examples include the Euclidean and entropic regularizers.\n\n4.2 Convergence of OMD to Nash Equilibrium\n\nWe start by characterizing the assumption on the delay model:\nAssumption 1. The delays are assumed to be:\n\n1. Synchronous: Gt\n2. Bounded: dt\n\ni = Gt\n\nj,\u2200i, j,\u2200t.\n\nTheorem 4.3. Fix a game with continuous action spaces (N ,X =(cid:81)N\n\ni \u2264 D,\u2200i,\u2200t (for some positive integer D).\n\ni=1 Xi,{ui}N\ni=1) that admits\nx\u2217 as the unique Nash equilibrium that is \u03bb-variationally stable. Under Assumption 1, the OMD\niterate xt given in Algorithm 1 converges to x\u2217, irrespective of the initial point x0.\nRemark 4.2. The proof is rather long and involved. To aid the understanding and enhance the intuition,\nwe break it down into four main steps, each of which will be proved in the appendix in detail.\n\n(cid:40)\n\nyt+1\ni = yt\n\n1. Since the delays are synchronous, we denote by Gt the common set and dt the common\n\ndelay at round t. The gradient update in OMD under delays can then be written as:\n\nvi(xs) = yt\n\ni + \u03b1t (cid:88)\ni =(cid:80)\n\n(cid:88)\ns\u2208Gt\ns\u2208Gt{vi(xs) \u2212 vi(xt)}. We show limt\u2192\u221e (cid:107)bt\ni(cid:107)\u2217\ni = 0 for each player i.\n1, . . . , bt\n\n{vi(xs) \u2212 vi(xt)}\n\nN ) and we have limt\u2192\u221e bt = 0 per Claim 1. Since each player\u2019s\ni) per Claim 1, we can then\n\ni + \u03b1t(|Gt|vi(xt) + bt\n\n|Gt|vi(xt) +\n\ngradient update can be written as yt+1\nwrite the joint OMD update (of all players) as:\n\ni = yt\n\n2. De\ufb01ne bt = (bt\n\nDe\ufb01ne bt\n\n.\n\n(4.2)\n\ni + \u03b1t\n\n(cid:41)\n\ns\u2208Gt\n\nxt = C(yt),\n\nyt+1 = yt + \u03b1t {|Gt|v(xt) + bt} .\n\n(4.3)\n(4.4)\nLet B(x\u2217, \u0001) (cid:44) {x \u2208 X | (cid:107)x \u2212 x\u2217(cid:107) < \u0001} be the open ball centered around x\u2217 with radius\n\u0001. Then, using \u03bb-Fenchel coupling as a \u201cenergy\" function and leveraging the handle on\nbt given by Claim 1, we can establish that, for any \u0001 > 0 the iterate xt will eventually\nenter B(x\u2217, \u0001) and visit B(x\u2217, \u0001) in\ufb01nitely often, no matter what the initial point x0 is.\nMathematically, the claim is that \u2200\u0001 > 0,\u2200x0,|{t | xt \u2208 B(x\u2217, \u0001)}| = \u221e.\n\n6\n\n\f3. Fix any \u03b4 > 0 and consider the set \u02dcB(x\u2217, \u03b4) (cid:44) {C(y) | F \u03bb(x\u2217, y) < \u03b4}. In other words,\n\u02dcB(x\u2217, \u03b4) is some \u201cneighborhood\" of x\u2217, which contains every x that is an image of some y\n(under the choice map C(\u00b7)) that is within \u03b4 distance of x\u2217 under the \u03bb-Fenchel coupling\n\u201cmetric\". Although F \u03bb(x\u2217, y) is not a metric, \u02dcB(x\u2217, \u03b4) contains an open ball within it.\nMathematically, the claim is that for any \u03b4 > 0, \u2203\u0001(\u03b4) > 0 such that: B(x\u2217, \u0001) \u2282 \u02dcB(x\u2217, \u03b4).\n4. For any \u201cneighborhood\" \u02dcB(x\u2217, \u03b4), after long enough rounds, if xt ever enters \u02dcB(x\u2217, \u03b4), it\nwill be trapped inside \u02dcB(x\u2217, \u03b4) thereafter. Mathematically, the claim is that for any \u03b4 > 0,\n\u2203T (\u03b4), such that for any t \u2265 T (\u03b4), if xt \u2208 \u02dcB(x\u2217, \u03b4), then x\u02dct \u2208 \u02dcB(x\u2217, \u03b4),\u2200\u02dct \u2265 t.\n\nPutting all four elements above together, we note that the signi\ufb01cance of Claim 3 is that, since the\niterate xt will enter B(x\u2217, \u0001) in\ufb01nitely often (per Claim 2), xt must enter \u02dcB(x\u2217, \u03b4) in\ufb01nitely often. It\ntherefore follows that, per Claim 4, starting from iteration t, xt will remain in \u02dcB(x\u2217, \u03b4). Since this is\ntrue for any \u03b4 > 0, we have F \u03bb(x\u2217, yt) \u2192 0 as t \u2192 \u221e. Per Statement 1 in Lemma 4.2, this leads to\nthat (cid:107)C(yt) \u2212 x\u2217(cid:107) \u2192 0 as t \u2192 \u221e, thereby establishing that xt = C(yt) \u2192 x\u2217 as t \u2192 0.\nIn fact, the result generalizes straightforwardly to multiple Nash equilibria. The proof of the con-\nvergence to the set case is line-by-line identical, provided we rede\ufb01ne, in a standard way, every\nquantity that measures the distance between two points to the corresponding quantity that measures\nthe distance between a point and a set (by taking the in\ufb01mum over the distances between the point\nand a point in that set). We directly state the result below.\n\nTheorem 4.4. Fix a game with continuous action spaces (N ,X =(cid:81)N\n\ni=1 Xi,{ui}N\ni=1) that admits\nX \u2217 as a \u03bb-variationally stable set (of necessarily all Nash equilibria), for some \u03bb \u2208 RN\n++. Under As-\nsumption 1, the OMD iterate xt given in Algorithm 1 satis\ufb01es limt\u2192\u221e dist(xt,X \u2217) = 0, irrespective\nof x0, where dist(\u00b7,\u00b7) is the standard point-to-set distance function induced by the norm (cid:107) \u00b7 (cid:107).\n\n5 Delayed Mirror Descent: Asynchronous and Unbounded Delays\n\nThe synchronous and bounded delay assumption in Assumption 1 is fairly strong. In this section,\nby a simple modi\ufb01cation of OMD, we propose a new learning algorithm called Delayed Mirror\nDescent (DMD), that allows the last-iterate convergence-to-Nash result to be generalized to cases\nwith arbitrary asynchronous delays among players as well as unbounded delay growth.\n\n5.1 Delayed Mirror Descent in Games\n\nThe main idea for the modi\ufb01cation is that when player i doesn\u2019t receive any gradient on round t,\ninstead of not doing any gradient updates as in OMD, he uses the most recent set of gradients to\nperform updates. More formally, de\ufb01ne the most recent information set6 as:\n\n(cid:26)Gt\n\n\u02dcGt\ni =\n\ni , if Gt\n\u02dcGt\u22121\n\ni (cid:54)= \u2205\n, if Gt\n\ni = \u2205.\nUnder this de\ufb01nition, Delayed Mirror Descent is (note that \u02dcGt\nWe only make the following assumption on the delays:\n\nAssumption 2. For each player i, limt\u2192\u221e(cid:80)t\n\n\u03b1s = 0.\n\ni\n\ns=min \u02dcGt\n\ni\n\ni is always non-empty here):\n\nThis assumption essentially requires that no player\u2019s delays grow too fast. Note that in particular,\nplayers delays can be arbitrarily asynchronous. To make this assumption more concrete, we next give\ntwo more explicit delay conditions that satisfy the main delay assumption. As made formal by the\nfollowing lemma, if the delays are bounded (but not necessarily synchronous), then Assumption 2 is\nsatis\ufb01ed. Furthermore, by appropriately choosing the sequence \u03b1t, Assumption 2 can accommodate\ndelays that are unbounded and grow super-linearly.\n\n6There may not be any gradient information in the \ufb01rst few rounds due to delays. Without loss of generality,\nwe can always start at the \ufb01rst round when there is non-empty gradient information, or equivalently, assume that\nsome gradient is available at t = 0.\n\n7\n\n\fAlgorithm 2 Delayed Mirror Descent on Games\n1: Each player i chooses an initial y0\ni .\n2: for t = 0, 1, 2, . . . do\nfor i = 1, . . . , N do\n3:\n4:\n5:\n6:\n7: end for\n\n(cid:80)\ni = arg maxxi\u2208Xi{(cid:104)yt\nxt\nyt+1\ni = yt\ns\u2208 \u02dcGt\nend for\n\ni , xi(cid:105) \u2212 hi(xi)}\nvi(xs)\n\ni + \u03b1t\ni|\n| \u02dcGt\n\ni\n\nLemma 5.1. Let {ds\n\ni}\u221e\ns=1 be the delay sequences for player i.\n\n1. If each player i\u2019s delay is bounded (i.e. \u2203d \u2208 Z, ds\n\ni \u2264 d,\u2200s), then Assumption 2 is satis\ufb01ed\n\nfor any positive, non-increasing, not-summable-but-square-summable sequence {\u03b1t}.\n\n2. There exists a positive, non-increasing, not-summable-but-square-summable sequence (e.g.\n\n\u03b1t =\n\nt log t log log t ) such that if ds\n\n1\n\ni = O(s log s),\u2200i, then Assumption 2 is satis\ufb01ed.\n\nProof: We will only prove Statement 2, the more interesting case. Take \u03b1t =\n\nwhich is obviously positive, non-increasing and square-summable. Since(cid:82) t\n\nt log t log log t,\ns log s log log s ds =\ni be given and let \u02dct be the most recent\n\ns=4\n\n1\n\n1\n\nlog log log t \u2192 \u221e as t \u2192 \u221e, \u03b1t is not summable. Next, let \u02dcGt\nround (up to and including t) such that G \u02dct\n\u02dcGt\ni = G \u02dct\n\ni is not empty. This means:\ni ,Gk\n\ni = \u2205,\u2200k \u2208 (\u02dct, t].\n\nNote that since the gradient at time \u02dct will be available at time \u02dct + d\u02dct\n\ni \u2212 1, it follows that\n\n(5.1)\n\nt \u2212 \u02dct \u2264 d\u02dct\ni.\n\n(5.2)\nNote that this implies \u02dct \u2192 \u221e as t \u2192 \u221e, because otherwise, \u02dct is bounded, leading to the right-side d\u02dct\nbeing bounded, which contradicts to the left-side diverging to in\ufb01nity.\nSince ds\nimplies: t \u2264 \u02dct + K\u02dct log \u02dct. Denote st\nyielding st\n\ni \u2264 Ks log s for some K > 0. Consequently, Equation 5.2\ni , thereby\n\ni , Equation 5.1 implies that st\n\nmin = minG \u02dct\n\nmin = min \u02dcGt\n\ni = O(s log s), it follows that ds\ni \u2212 1 = \u02dct. Therefore:\ndst\n\nmin + dst\n\n(5.3)\nmin \u2192 \u221e as t \u2192 \u221e, because otherwise, the left-hand side of Equa-\nEquation (5.3) implies that st\ntion (5.3) is bounded while the right-hand side goes to in\ufb01nity (since \u02dct \u2192 \u221e as t \u2192 \u221e as established\nearlier).\nWith the above notation, it follows that:\n\n= \u02dct \u2212 st\n\nmin + 1.\n\nmin\n\nmin\n\ni\n\ni\n\nt(cid:88)\n\nlim\nt\u2192\u221e\n\ns=min \u02dcGt\n\ni\n\n\uf8f1\uf8f2\uf8f3 \u02dct(cid:88)\nmin + (\u02dct log \u02dct)\u03b1\u02dct(cid:111)\n\ns=st\n\nmin\n\n\u03b1s = lim\nt\u2192\u221e\n\nmin\n\nmin\n\n\u03b1st\n\nt(cid:88)\n\n\u03b1s +\n\n\u03b1s\n\ns=\u02dct+1\n\n\uf8fc\uf8fd\uf8fe\n\nmin\n\ndst\ni\nmin) log log(st\n\nmin)\n\nmin) log(st\n\nK\u02dct log \u02dct\n\n(\u02dct + 1) log(\u02dct + 1) log log(\u02dct + 1)\n\n(5.4)\n\n(5.5)\n\n(cid:41)\n(cid:27)\n\n(5.6)\n\n+\n\n+\n\n(cid:27)\n\nK\u02dct log \u02dct\n\n(\u02dct + 1) log(\u02dct + 1) log log(\u02dct + 1)\n\n(5.7)\n\n= 0.\n\n(5.8)\n(cid:4)\n\ni\n\ns=st\ndst\n\nt(cid:88)\n(cid:110)\n(cid:40)\n(cid:26)\n(cid:26)\n\n(st\n\n(st\n\n\u03b1s \u2264 lim\nt\u2192\u221e\n\n\u2264 lim\nt\u2192\u221e\n\n= lim\nt\u2192\u221e\n\n\u2264 lim\nt\u2192\u221e\n\n\u2264 lim\nt\u2192\u221e\n\nK(st\n\nmin) log(st\n\nmin)\n\nmin) log(st\n\nmin) log log(st\n\nmin)\n\nK\n\nlog log(st\n\nmin)\n\n+\n\nK\n\nlog log(\u02dct + 1)\n\n8\n\n\fRemark 5.1. The proof to the second claim of Lemma 5.1 indicates that one can also easily obtain\nslightly larger delay growth rates: O(t log t log log t), O(t log t log log t log log log t) and so on, by\nchoosing the corresponding step size sequences. Further, it is conceivable that one can identify\nmeaningfully larger delay growth rates that still satisfy Assumption 2, particularly under more\nrestrictions on the degree of delay asynchrony among the players. We leave that for future work.\n\n5.2 Convergence of DMD to Nash Equilibrium\n\nTheorem 5.2. Fix a game with continuous action spaces (N ,X =(cid:81)N\n\ni=1 Xi,{ui}N\ni=1) that admits\nx\u2217 as the unique Nash equilibrium that is \u03bb-variationally stable. Under Assumption 2, the DMD\niterate xt given in Algorithm 2 converges to x\u2217, irrespective of the initial point x0.\nThe proof here uses a similar framework as the one in Remark 4.2, although the details are somewhat\ndifferent. Building on the notation and arguments given in Remark 4.2, we again outline three main\ningredients that together establish the result. Detailed proofs are omitted due to space limitation.\n\n1. The gradient update in DMD can be rewritten as:\n\nyt+1\ni = yt\n\nvi(xs) = yt\n\ni + \u03b1tvi(xt) + \u03b1t (cid:88)\n\nvi(xs) \u2212 vi(xt)\n\n| \u02dcGt\ni|\n\n.\n\ns\u2208 \u02dcGt\n\ni\n\n(cid:88)\n\ni +\n\ni =(cid:80)\n\n\u03b1t\n| \u02dcGt\ni|\n\ns\u2208 \u02dcGt\n\ni\n\ns\u2208 \u02dcGt\nvi(xs)\u2212vi(xt)\n\ni\n\n| \u02dcGt\ni|\nyt+1\ni = yt\n\nBy de\ufb01ning: bt\n\n, we can write player i\u2019s gradient update as:\n\ni + \u03b1t(vi(xt) + bt\n\ni).\n\nBy bounding bt\nthat bt\n\ni\u2019s magnitude using the delay sequence, Assumption 2 allows us to establish\ni has negligible impact over time. Mathematically, the claim is that limt\u2192\u221e (cid:107)bt\ni(cid:107)\u2217\ni = 0.\n\n2. The joint DMD update can be written as:\n\nxt = C(yt),\n\nyt+1 = yt + \u03b1t(v(xt) + bt).\n\n(5.9)\n(5.10)\nHere again using \u03bb-Fenchel coupling as a \u201cenergy\" function and leveraging the handle on bt\ngiven by Claim 1, we show that for any \u0001 > 0 the iterate xt will eventually enter B(x\u2217, \u0001)\nand visit B(x\u2217, \u0001) in\ufb01nitely often, no matter what the initial point x0 is. Furthermore, per\nClaim 3 in Remark 4.2, B(x\u2217, \u0001) \u2282 \u02dcB(x\u2217, \u03b4). This implies that xt must enter \u02dcB(x\u2217, \u03b4)\nin\ufb01nitely often.\n\n3. Again using \u03bb-Fenchel coupling, we show that under DMD, for any \u201cneighborhood\"\n\u02dcB(x\u2217, \u03b4), after long enough iterations, if xt ever enters \u02dcB(x\u2217, \u03b4), it will be trapped in-\nside \u02dcB(x\u2217, \u03b4) thereafter.\n\nCombining the above three elements, it follows that under DMD, starting from iteration t, xt will\nremain in \u02dcB(x\u2217, \u03b4). Since this is true for any \u03b4 > 0, we have F \u03bb(x\u2217, yt) \u2192 0 as t \u2192 \u221e, thereby\nestablishing that xt = C(yt) \u2192 x\u2217 as t \u2192 0.\nHere again, the result generalizes straightforwardly to the multiple Nash equilibria case (with identical\nproofs modulo using point-to-set distance metric). We omit the statement.\n\n6 Conclusion\n\nWe examined a model of game-theoretic learning based on OMD with asynchronous and delayed\ninformation. By focusing on games with \u03bb- stable equilibria, we showed that the sequence of play\ninduced by OMD converges whenever the feedback delays faced by the players are synchronous and\nbounded. Subsequently, to tackle fully decentralized, asynchronous environments with unbounded\nfeedback delays (possibly growing sublinearly in the game\u2019s horizon), we showed that our conver-\ngence result still holds under delayed mirror descent, a variant of vanilla OMD that leverages past\ninformation even in rounds where no feedback is received. To further enhance the distributed aspect\nof the algorithm, in future work we intend to focus on the case where the players\u2019 gradient input is\nnot only delayed, but also subject to stochastic imperfections \u2013 or, taking this to its logical extreme,\nwhen players only have observations of their in-game payoffs, and have no gradient information.\n\n9\n\n\f7 Acknowledgments\n\nZhengyuan Zhou is supported by Stanford Graduate Fellowship and he would like to thank Walid\nKrichene and Alex Bayen for stimulating discussions (and their charismatic research style) that have\n\ufb01rmly planted the initial seeds for this work. Panayotis Mertikopoulos gratefully acknowledges\n\ufb01nancial support from the Huawei Innovation Research Program ULTRON and the ANR JCJC project\nORACLESS (grant no. ANR\u201316\u2013CE33\u20130004\u201301). Claire Tomlin is supported in part by the NSF\nCPS:FORCES grant (CNS-1239166).\n\nReferences\n[1] S. ARORA, E. HAZAN, AND S. KALE, The multiplicative weights update method: A meta-algorithm and\n\napplications, Theory of Computing, 8 (2012), pp. 121\u2013164.\n\n[2] M. BALANDAT, W. KRICHENE, C. TOMLIN, AND A. BAYEN, Minimizing regret on re\ufb02exive Banach\nspaces and Nash equilibria in continuous zero-sum games, in NIPS \u201916: Proceedings of the 30th Interna-\ntional Conference on Neural Information Processing Systems, 2016.\n\n[3] A. BLUM, On-line algorithms in machine learning, in Online algorithms, Springer, 1998, pp. 306\u2013325.\n[4] N. CESA-BIANCHI AND G. LUGOSI, Prediction, learning, and games, Cambridge university press, 2006.\n[5] J. COHEN, A. H\u00c9LIOU, AND P. MERTIKOPOULOS, Learning with bandit feedback in potential games, in\nNIPS \u201917: Proceedings of the 31st International Conference on Neural Information Processing Systems,\n2017.\n\n[6] T. DESAUTELS, A. KRAUSE, AND J. W. BURDICK, Parallelizing exploration-exploitation tradeoffs in\ngaussian process bandit optimization., Journal of Machine Learning Research, 15 (2014), pp. 3873\u20133923.\n[7] E. HAZAN, Introduction to Online Convex Optimization, Foundations and Trends(r) in Optimization Series,\n\nNow Publishers, 2016.\n\n[8] P. JOULANI, A. GYORGY, AND C. SZEPESV\u00c1RI, Online learning under delayed feedback, in Proceedings\n\nof the 30th International Conference on Machine Learning (ICML-13), 2013, pp. 1453\u20131461.\n\n[9] A. KALAI AND S. VEMPALA, Ef\ufb01cient algorithms for online decision problems, Journal of Computer and\n\nSystem Sciences, 71 (2005), pp. 291\u2013307.\n\n[10] S. KRICHENE, W. KRICHENE, R. DONG, AND A. BAYEN, Convergence of heterogeneous distributed\nlearning in stochastic routing games, in Communication, Control, and Computing (Allerton), 2015 53rd\nAnnual Allerton Conference on, IEEE, 2015, pp. 480\u2013487.\n\n[11] W. KRICHENE, B. DRIGH\u00c8S, AND A. M. BAYEN, Online learning of nash equilibria in congestion games,\n\nSIAM Journal on Control and Optimization, 53 (2015), pp. 1056\u20131081.\n\n[12] K. LAM, W. KRICHENE, AND A. BAYEN, On learning how players learn: estimation of learning dynamics\nin the routing game, in Cyber-Physical Systems (ICCPS), 2016 ACM/IEEE 7th International Conference\non, IEEE, 2016, pp. 1\u201310.\n\n[13] N. LITTLESTONE AND M. K. WARMUTH, The weighted majority algorithm, INFORMATION AND\n\nCOMPUTATION, 108 (1994), pp. 212\u2013261.\n\n[14] R. MEHTA, I. PANAGEAS, AND G. PILIOURAS, Natural selection as an inhibitor of genetic diversity:\nMultiplicative weights updates algorithm and a conjecture of haploid genetics, in ITCS \u201915: Proceedings\nof the 6th Conference on Innovations in Theoretical Computer Science, 2015.\n\n[15] P. MERTIKOPOULOS, Learning in games with continuous action sets and unknown payoff functions.\n\nhttps://arxiv.org/abs/1608.07310, 2016.\n\n[16] P. MERTIKOPOULOS, C. H. PAPADIMITRIOU, AND G. PILIOURAS, Cycles in adversarial regularized\nlearning, in SODA \u201918: Proceedings of the 29th annual ACM-SIAM symposium on discrete algorithms, to\nappear.\n\n[17] A. S. NEMIROVSKI AND D. B. YUDIN, Problem Complexity and Method Ef\ufb01ciency in Optimization,\n\nWiley, New York, NY, 1983.\n\n[18] Y. NESTEROV, Primal-dual subgradient methods for convex problems, Mathematical Programming, 120\n\n(2009), pp. 221\u2013259.\n\n[19] G. PALAIOPANOS, I. PANAGEAS, AND G. PILIOURAS, Multiplicative weights update with constant\nstep-size in congestion games: Convergence, limit cycles and chaos, in NIPS \u201917: Proceedings of the 31st\nInternational Conference on Neural Information Processing Systems, 2017.\n\n[20] K. QUANRUD AND D. KHASHABI, Online learning with adversarial delays, in Advances in Neural\n\nInformation Processing Systems, 2015, pp. 1270\u20131278.\n\n[21] R. T. ROCKAFELLAR AND R. J.-B. WETS, Variational analysis, vol. 317, Springer Science & Business\n\nMedia, 2009.\n\n10\n\n\f[22] S. SHALEV-SHWARTZ ET AL., Online learning and online convex optimization, Foundations and Trends R(cid:13)\n\nin Machine Learning, 4 (2012), pp. 107\u2013194.\n\n[23] S. SHALEV-SHWARTZ AND Y. SINGER, Convex repeated games and Fenchel duality, in Advances in\n\nNeural Information Processing Systems 19, MIT Press, 2007, pp. 1265\u20131272.\n\n[24] Y. VIOSSAT AND A. ZAPECHELNYUK, No-regret dynamics and \ufb01ctitious play, Journal of Economic\n\nTheory, 148 (2013), pp. 825\u2013842.\n\n[25] L. XIAO, Dual averaging methods for regularized stochastic learning and online optimization, Journal of\n\nMachine Learning Research, 11 (2010), pp. 2543\u20132596.\n\n[26] Z. ZHOU, N. BAMBOS, AND P. GLYNN, Dynamics on linear in\ufb02uence network games under stochastic\nenvironments, in International Conference on Decision and Game Theory for Security, Springer, 2016,\npp. 114\u2013126.\n\n[27] Z. ZHOU, B. YOLKEN, R. A. MIURA-KO, AND N. BAMBOS, A game-theoretical formulation of in\ufb02uence\n\nnetworks, in American Control Conference (ACC), 2016, IEEE, 2016, pp. 3802\u20133807.\n\n[28] M. ZHU AND E. FRAZZOLI, Distributed robust adaptive equilibrium computation for generalized convex\n\ngames, Automatica, 63 (2016), pp. 82\u201391.\n\n[29] M. ZINKEVICH, Online convex programming and generalized in\ufb01nitesimal gradient ascent, in ICML \u201903:\n\nProceedings of the 20th International Conference on Machine Learning, 2003, pp. 928\u2013936.\n\n11\n\n\f", "award": [], "sourceid": 3123, "authors": [{"given_name": "Zhengyuan", "family_name": "Zhou", "institution": "Stanford University"}, {"given_name": "Panayotis", "family_name": "Mertikopoulos", "institution": "CNRS (French National Center for Scientific Research)"}, {"given_name": "Nicholas", "family_name": "Bambos", "institution": null}, {"given_name": "Peter", "family_name": "Glynn", "institution": "Stanford University"}, {"given_name": "Claire", "family_name": "Tomlin", "institution": "UC Berkeley"}]}