{"title": "Repeated Games against Budgeted Adversaries", "book": "Advances in Neural Information Processing Systems", "page_first": 1, "page_last": 9, "abstract": "We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player's best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efficiently computable and relies on a random playout\" technique. We give three diverse applications of this algorithmic template: a cost-sensitive \"Hedge\" setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets.\"", "full_text": "Repeated Games against Budgeted Adversaries\n\nJacob Abernethy\u2217\n\nDivision of Computer Science\n\nUC Berkeley\n\njake@cs.berkeley.edu\n\nmanfred@cse.ucsc.edu\n\nManfred K. Warmuth\u2020\n\nDepartment of Computer Science\n\nUC Santa Cruz\n\nAbstract\n\nWe study repeated zero-sum games against an adversary on a budget. Given that\nan adversary has some constraint on the sequence of actions that he plays, we\nconsider what ought to be the player\u2019s best mixed strategy with knowledge of\nthis budget. We show that, for a general class of normal-form games, the min-\nimax strategy is indeed ef\ufb01ciently computable and relies on a \u201crandom playout\u201d\ntechnique. We give three diverse applications of this new algorithmic template:\na cost-sensitive \u201cHedge\u201d setting, a particular problem in Metrical Task Systems,\nand the design of combinatorial prediction markets.\n\n1\n\nIntroduction\n\nHow can we reasonably expect to learn given possibly adversarial data? Overcoming this obstacle\nhas been one of the major successes of the Online Learning framework or, more generally, the\nso-called competitive analysis of algorithms: rather than measure an algorithm only by the cost it\nincurs, consider this cost relative to an optimal \u201ccomparator algorithm\u201d which has knowledge of\nthe data in advance. A classic example is the so-called \u201cexperts setting\u201d: assume we must predict a\nsequence of binary outcomes and we are given access to a set of experts, each of which reveals their\nown prediction for each outcome. After each round we learn the true outcome and, hence, which\nexperts predicted correctly or incorrectly. The expert setting is based around a simple assumption,\nthat while some experts\u2019 predictions may be adversarial, we have an a priori belief that there is at\nleast one good expert whose predictions will be reasonably accurate. Under this relatively weak\ngood-expert assumption, one can construct algorithms that have quite strong loss guarantees.\nAnother way to interpret this sequential prediction model is to treat it as a repeated two-player\nzero-sum game against an adversary on a budget; that is, the adversary\u2019s sequence of actions is\nrestricted in that play ceases once the adversary exceeds the budget.\nIn the experts setting, the\nassumption \u201cthere is a good expert\u201d can be reinterpreted as a \u201cnature shall not let the best expert err\ntoo frequently\u201d, perhaps more than some \ufb01xed number of times.\nIn the present paper, we develop a general framework for repeated game-playing against an adver-\nsary on a budget, and we provide a simple randomized strategy for the learner/player for a particular\nclass of these games. The proposed algorithms are based on a technique, which we refer to as a\n\u201crandom playout\u201d, that has become a very popular heuristic for solving games with massively-large\nstate spaces. Roughly speaking, a random playout in an extensive-form game is a way to measure\nthe likely outcome at a given state by \ufb01nishing the game randomly from this state. Random play-\nouts, often known simply as Monte Carlo methods, have become particularly popular for solving\nthe game of Go [5], which has led to much follow-up work for general games [12, 11]. The Bud-\ngeted Adversary game we consider also involves exponentially large state spaces, yet we achieve\nef\ufb01ciency using these random playouts. The key result of this paper is that the proposed random\nplayout is not simply a good heuristic, it is indeed minimax optimal for the games we consider.\n\n\u2217Supported by a Yahoo! PhD Fellowship and NSF grant 0830410.\n\u2020Supported by NSF grant IIS-0917397.\n\n1\n\n\fAbernethy et al [1] was the \ufb01rst to use a random playout strategy to optimally solve an adversarial\nlearning problem, namely for the case of the so-called Hedge Setting introduced by Freund and\nSchapire [10]. Indeed, their model can be interpreted as a particular special case of a Budgeted\nAdversary problem. The generalized framework that we give in the \ufb01rst half of the paper, however,\nhas a much larger range of applications. We give three such examples, described brie\ufb02y below.\nMore details are given in the second half of the paper.\nCost-sensitive Hedge Setting. In the standard Hedge setting, it is assumed that each expert suffers\na cost in [0, 1] on each round. But a surprisingly-overlooked case is when the cost ranges differ,\nwhere expert i may suffer per-round cost in [0, ci] for some \ufb01xed ci > 0. The vanilla approach, to\nuse a generic bound of maxi ci, is extremely loose, and we know of no better bounds for this case.\nOur results provide the optimal strategy for this cost-sensitive Hedge setting.\nMetrical Task Systems (MTS). The MTS problem is decision/learning problem similar to the\nHedge Setting above but with an added dif\ufb01culty: the learner is required to pay the cost of moving\nthrough a given metric space. Finding even a near-optimal generic algorithm has remained elusive\nfor some time, with recent encouraging progress made in one special case [2], for the so-called\n\u201cweighted-star\u201d metric. Our results provide a simple minimax optimal algorithm for this problem.\nCombinatorial Prediction Market Design: There has been a great deal of work in designing so-\ncalled prediction markets, where bettors may purchase contracts that pay off when the outcome of a\nfuture event is correctly guessed. One important goal of such markets is to minimize the potential\nrisk of the \u201cmarket maker\u201d who sells the contracts and pays the winning bettors. Another goal is\nto design \u201ccombinatorial\u201d markets, that is where the outcome space might be complex. The latter\nhas proven quite challenging, and there are few positive results within this area. We show how\nto translate the market-design problem into a Budgeted Adversary problem, and from here how to\nincorporate certain kinds of combinatorial outcomes.\n\n2 Preliminaries\nNotation: We shall write [n] for the set {1, 2, . . . , n}, and [n]\u2217 to be the set of all \ufb01nite-length\nsequences of elements of [n]. We will use the greek symbols \u03c1 and \u03c3 to denote such sequences\ni1i2 . . . iT , where it \u2208 [n]. We let \u2205 denote the empty sequence. When we have de\ufb01ned some\nT -length sequence \u03c1 = i1i2 . . . iT , we may write \u03c1t to refer to the t-length pre\ufb01x of \u03c1, namely\n\u03c1t = i1i2 . . . it, and clearly t \u2264 T . We will generally use w to refer to a distribution in \u2206n, the\nn-simplex, where wi denotes the ith coordinate of w. We use the symbol ei to denote the ith basis\nvector in n dimensions, namely a vector with a 1 in the ith coordinate, and 0\u2019s elsewhere. We shall\nuse 1[\u00b7] to denote the \u201cindicator function\u201d, where 1[predicate] is 1 if predicate is true, and\n0 if it is false. It may be that predicate is a random variable, in which case 1[predicate] is a\nrandom variable as well.\n\n2.1 The Setting: Budgeted Adversary Games\n\nWe will now describe the generic sequential decision problem, where a problem instance is char-\nacterized by the following triple: an n \u00d7 n loss matrix M, a monotonic \u201ccost function\u201d cost :\n[n]\u2217 \u2192 R+, and a cost budget k. A cost function is monotonic as long as it satis\ufb01es the relation\ncost(\u03c1\u03c3) \u2264 cost(\u03c1i\u03c3) for all \u03c1, \u03c3 \u2208 [n]\u2217 and all i \u2208 [n]. Play proceeds as follows:\n\n1. On each round t, the player chooses a distribution wt \u2208 \u2206n over his action space.\n2. An outcome it \u2208 [n] is chosen by Nature (potentially an adversary).\n3. The player suffers w>\n4. The game proceeds until the \ufb01rst round in which the budget is spent, i.e. the round T when\n\nt Meit.\n\ncost(i1i2 . . . iT\u22121) \u2264 k < cost(i1i2 . . . iT\u22121iT ).\n\nThe goal of the Player is to choose each wt in order to minimize the total cost of this repeated game\non all sequences of outcomes. Note, importantly, that the player can learn from the past, and hence\nwould like an ef\ufb01ciently computable function w : [n]\u2217 \u2192 \u2206n, where on round t the player is given\n\u03c1t\u22121 = (i1 . . . it\u22121) and sets wt \u2190 w(\u03c1t\u22121). We can de\ufb01ne the worst-case cost of an algorithm\n\n2\n\n\fw : [n]\u2217 \u2192 \u2206n by its performance against a worst-case sequence, that is\n\nWorstCaseLoss(w; M, cost, k) :=\n\nmax\n\n\u03c1 = i1i2 . . . \u2208 [n]\u2217\n\ncost(\u03c1T \u22121) \u2264 k < cost(\u03c1T )\n\nw(\u03c1t\u22121)>Meit.\n\nTX\n\nt=1\n\nTX\n\nt=1\n\nNote that above T is a parameter chosen according to \u03c1 and the budget. We can also de\ufb01ne the min-\nimax loss, which is de\ufb01ned by choosing the w(\u00b7) which minimizes WorstCaseLoss(\u00b7). Speci\ufb01cally,\n\nMinimaxLoss(M, cost, k) :=\n\nmin\n\nw:[n]\u2217\u2192\u2206n\n\nmax\n\n\u03c1 = i1i2 . . . \u2208 [n]\u2217\n\ncost(\u03c1T \u22121) \u2264 k < cost(\u03c1T )\n\nw(\u03c1t\u22121)>Meit.\n\nIn the next section, we describe the optimal algorithm for a restricted class of M. That is, we obtain\nthe mapping w which optimizes WorstCaseLoss(w; M, cost, k).\n\nWe will start by assuming that M is a nonnegative diagonal matrix, that is M = diag(c1, c2, . . . , cn),\n\n3 The Algorithm\nand ci > 0 for all i. With these values ci, de\ufb01ne the distribution q \u2208 \u2206n with qi := 1/ciP\nGiven a current state \u03c1, the algorithm will rely heavily on our ability to compute the following\nfunction \u03a6(\u00b7). For any \u03c1 \u2208 [n]\u2217 such that cost(\u03c1) > k, de\ufb01ne \u03a6(\u03c1) := 0. Otherwise, let\n\nj 1/cj\n\n.\n\n1P\n\n\u03a6(\u03c1) :=\n\ni 1/ci\n\nE\u2200t:it\u223cq\n\n\" \u221eX\n\nt=0\n\n#\n\n1[cost(\u03c1i1 . . . it) \u2264 k]\n\nNotice, this is the expected length of a random process. Of course, we must impose the natural con-\ndition that the length of this process has a \ufb01nite expectation. Also, since we assume that the cost in-\ncreases, it is reasonable to require that the distribution over the length, i.e. min{t : cost(\u03c1i1 . . . it) >\nk}, has an exponentially decaying tail. Under these weak conditions, the following m-trial Monte\nCarlo method will provide a high probability estimate to error within O(m\u22121/2).\n\nAlgorithm 1 Ef\ufb01cient Estimation of \u03a6(\u03c1)\n\nSample: in\ufb01nite random sequence \u03c3 := i1i2 . . . where Pr(it = i) = qi\nLet: Ti = max{t : cost(\u03c1\u03c3t\u22121) \u2264 k}\n\nfor i=1. . . m do\n\nend for\nReturn\n\nPm\n\ni=1 Ti\nm\n\nNotice that the in\ufb01nite sequence \u03c3 does not have to be fully generated. Instead, we can continue to\nsample the sequence and simply stop when the condition cost(\u03c1\u03c3t\u22121) \u2265 k is reached. We can now\nde\ufb01ne our algorithm in terms of \u03a6(\u00b7).\n\nAlgorithm 2 Player\u2019s optimal strategy\n\nInput: state \u03c1\nCompute: \u03a6(\u03c1), \u03a6(\u03c1, 1), \u03a6(\u03c1, 2), . . . , \u03a6(\u03c1, n)\nLet: set w(\u03c1) with values wi(\u03c1) = \u03a6(\u03c1)\u2212\u03a6(\u03c1,i)\n\nci\n\n4 Minimax Optimality\n\nNow we prove that Algorithm 2 is both \u201clegal\u201d and minimax optimal.\nLemma 4.1. The vector w(\u03c1) computed in Algorithm 2 is always a valid distribution.\n\n3\n\n\fProof. It must \ufb01rst be established that wi(\u03c1) \u2265 0 for all i and \u03c1. This, however, follows because we\nassume that the function cost() is monotonic, which implies that cost(\u03c1\u03c3) \u2264 cost(\u03c1i\u03c3) and hence\ncost(\u03c1i\u03c3) \u2264 k =\u21d2 cost(\u03c1\u03c3) \u2264 k, and hence 1[cost(\u03c1i\u03c3) \u2264 k] \u2264 1[cost(\u03c1\u03c3) \u2264 k]. Taking the\nexpected difference of the in\ufb01nite sum of these two indicators leads to \u03a6(\u03c1) \u2212 \u03a6(\u03c1i) \u2265 0, which\nimplies wi(\u03c1) \u2265 0 as desired.\n\ni wi(\u03c1) = 1. We claim that the following recurrence relation holds for\n\nWe must also show thatP\n\n\u03a6(\u03c1) =\n\n1P\n| {z }\n\ni 1/ci\n\ufb01rst step\n\nthe function \u03a6(\u03c1) whenever cost(\u03c1) \u2264 k:\n\n+X\n|\nde\ufb01ned by q, and scaled by the constant (P\n! \n\nwi(\u03c1) = X\n X\n\n\u03a6(\u03c1) \u2212 \u03a6(\u03c1i)\n\nX\n\nci\n\ni\n\ni\n\ni\n\nremaining steps\n\nqi\u03a6(\u03c1i)\n\n{z\n!\ni 1/ci)\u22121. Hence,\n\n}\n X\n+X\n\n1/ci\n\n=\n\ni\n\n, for any \u03c1 s.t. cost(\u03c1) < k.\n\n\u03a6(\u03c1) \u2212X\n!\n\u2212X\n\ni\n\n\u03a6(\u03c1i)\n\nci\n\nThis is clear from noticing that \u03a6 is an expected random walk length, with transition probabilities\n\n=\n\n1P\ni 1/ci\nwhere the last equality holds because qi = 1/ciP\nTheorem 4.1. For M = diag(c1, . . . , cn), Algorithm 2 is minimax optimal for the Budgeted Adver-\nsary problem. Furthermore, \u03a6(\u2205) = MinimaxLoss(M, cost, k).\n\nqi\u03a6(\u03c1i)\n\n= 1\n\n\u03a6(\u03c1i)\n\n1/ci\n\nj 1/cj\n\nci\n\n.\n\ni\n\ni\n\ni\n\nTX\n\nt=1\n\nTX\n\nt=1\n\nTX\n\nt=1\n\nProof. First we prove an upper bound. Notice that, for an sequence \u03c1 = i1i2i3 . . . iT , the total cost\nof Algorithm 2 will be\n\nw(\u03c1t\u22121)>Meit =\n\nwit(\u03c1t\u22121)cit =\n\n\u03a6(\u03c1t\u22121) \u2212 \u03a6(\u03c1t)\n\ncit\n\ncit = \u03a6(\u2205) \u2212 \u03a6(\u03c1T ) \u2264 \u03a6(\u2205)\n\nand hence the total cost of the algorithm is always bounded by \u03a6(\u2205).\nOn the other hand, we claim that \u03a6(\u2205) can always be achieved by an adversary for any algorithm\nw0(\u00b7). Construct a sequence \u03c1 as follows. Given that \u03c1t\u22121 has been constructed so far, select any\ncoordinate it \u2208 [n] for which wit(\u03c1t\u22121) \u2264 w0\n(\u03c1t\u22121), that is, where the the algorithm w0 places at\nleast as much weight on it as the proposed algorithm w we de\ufb01ned in Algorithm 2. This must always\nbe possible because both w(\u03c1t\u22121) and w0(\u03c1t\u22121) are distributions and neither can fully dominate the\nother. Set \u03c1t \u2190 \u03c1t\u22121i. Continue constructing \u03c1 until the budget is reached, i.e. cost(\u03c1) > k. Now,\nlet us check the loss of w0 on this sequence \u03c1:\n\nit\n\nTX\n\nt=1\n\nTX\n\nt=1\n\n(\u03c1t\u22121)cit \u2265 TX\n\nw0\n\nit\n\nt=1\n\nw0(\u03c1t\u22121)>Meit =\n\nwit(\u03c1t\u22121)cit = \u03a6(\u2205) \u2212 \u03a6(\u03c1) = \u03a6(\u2205)\n\nHence, an adversary can achieve at least \u03a6(\u2205) loss for any algorithm w0.\n\n4.1 Extensions\n\nFor simplicity of exposition, we proved Theorem 4.1 under a somewhat limited scope: only for\ndiagonal matrices M, known budget k and cost(). But with some work, these restrictions can be\nlifted. We sketch a few extensions of the result, although we omit the details due to lack of space.\nFirst, the concept of a cost() function and a budget k is not entirely necessary. Indeed, we can\nrede\ufb01ne the Budgeted Adversary game in terms of an arbitrary stopping criterion \u03b4 : [n]\u2217 \u2192 {0, 1},\nwhere \u03b4(\u03c1) = 0 is equivalent to \u201cthe budget has been exceeded\u201d. The only requirement is that \u03b4()\nis monotonic, which is naturally de\ufb01ned as \u03b4(\u03c1i\u03c3) = 1 =\u21d2 \u03b4(\u03c1\u03c3) = 1 for all \u03c1, \u03c3 \u2208 [n]\u2217 and\nall i \u2208 [n]. This alternative budget interpretation lets us consider the sequence \u03c1 as a path through\n\n4\n\n\fa game tree. At a given node \u03c1t of the tree, the adversary\u2019s action it+1 determines which branch to\nfollow. As soon as \u03b4(\u03c1t) = 0 we have reached a terminal node of this tree.\nSecond, we need not assume that the budget k, or even the generalized stopping criterion \u03b4(), is\nknown in advance. Instead, we can work with the following generalization: the stopping criterion \u03b4\nis drawn from a known prior \u03bb and given to the adversary before the start of the game. The resulting\noptimal algorithm depends simply on estimating a new version of \u03a6(\u03c1). \u03a6(\u03c1) is now rede\ufb01ned as\nboth an expectation over a random \u03c3 and a random \u03b4 drawn from the posterior of \u03bb, that is where\nwe condition on the event \u03b4(\u03c1) = 1.\nThird, Theorem 4.1 can be extended to a more general class of M, namely inverse-nonnegative\nmatrices, where M is invertible and M\u22121 has all nonnegative entries. (In all the examples we give\nwe need only diagonal M, but we sketch this generalization for completeness). If we let 1n be\nthe vector of n ones, then de\ufb01ne D = diag\u22121(M\u221211n), which is a nonnegative diagonal matrix.\nAlso let N = DM\u22121 and notice that the rows of N are the normalized rows of M\u22121. We can\nuse Algorithm 2 with the diagonal matrix D, and attain distribution w0(\u03c1) for any \u03c1. To obtain an\nalgorithm for the matrix M (not D), we simply let w(\u03c1) = (w0(\u03c1)>N)>, which is guaranteed to\nbe a distribution. The loss of w is identical to w0 since w(\u03c1)>M = w0(\u03c1)>D by construction.\nFourth, we have only discussed minimizing loss against a budgeted adversary. But all the results\ncan be extended easily to the case where the player is instead maximizing gain (and the adversary\nis minimizing). A particularly surprising result is that the minimax strategy is identical in either\ncase; that is, the the recursive de\ufb01nition of wi(\u03c1) is the same whether the player is maximizing\nor minimizing. However, the termination condition might change depending on whether we are\nminimizing or maximizing. For example in the expert setting, the game stops when all experts have\ncost larger than k versus at least one expert has gain at least k. Therefore for the same budget size\nk, the minimax value of the gain version is typically smaller than the value of the loss version.\n\nSimpli\ufb01ed Notation. For many examples, including two that we consider below, recording the\nentire sequence \u03c1 is unnecessary\u2014the only relevant information is the number of times each i occurs\nin \u03c1 and not where it occurs. This is the case precisely when the function cost(\u03c1) is unchanged up\nto permutations of \u03c1. In such situations, we can consider a smaller state space, which records the\n\u201ccounts\u201d of each i in the sequence \u03c1. We will use the notation s \u2208 Nn, where st = ei1 + . . . + eit\nfor the sequence \u03c1t = i1i2 . . . it.\n\n5 The Cost-Sensitive Hedge Setting\n\nreceive a sequence of loss vectors \u2018t \u2208 {0, 1}n. The total loss to the learner isP\n\nA straightforward application of Budgeted Adversary games is the \u201cHedge setting\u201d introduced by\nFreund and Schapire [10], a version of the aforementioned experts setting. The minimax algorithm\nfor this special case was already thoroughly developed by Abernethy et al [1]. We describe an\ninteresting extension that can be achieved using our techniques which has not yet been solved.\nThe Hedge game goes as follows. A learner must predict a sequence of distributions wt \u2208 \u2206n, and\nt wt \u00b7 \u2018t, and the\ngame ceases only once the best expert has more than k errors, i.e. mini\nt \u2018t,i > k. The learner\nwants to minimize his total loss.\nThe natural way to transform the Hedge game into a Budgeted Adversary problem is as follows.\nWe\u2019ll let s be the state, de\ufb01ned as the vector of cumulative losses of all the experts.\n\nP\n\n\" 1\n\n#\n\nM =\n\n...\n\n1\n\ncost(s) = min\n\ni\n\nsi\n\nX\n\nwt \u00b7 \u2018t =X\n\nt\n\nt\n\nw>\nt Meit\n\nThe proposed reduction almost works, except for one key issue: this only allows cost vectors of the\nform \u2018t = Meit = eit, since by de\ufb01nition Nature chooses columns of M. However, as shown in\nAbernethy et al, this is not a problem.\nLemma 5.1 (Lemma 11 and Theorem 12 of [1]). In the Hedge game, the worst case adversary\nalways chooses \u2018t \u2208 {e1, . . . , en}.\nThe standard and more well-known, although non-minimax, algorithm for the Hedge setting [10]\nuses a simple modi\ufb01cation of the Weighted Majority Algorithm [14], and is described simply by\n\n5\n\n\fP\n\n\u221a\n\nsetting wi(s) = exp(\u2212\u03b7si)\nj exp(\u2212\u03b7sj ). With the appropriate tuning of \u03b7, it is possible to bound the total\nloss of this algorithm by k +\n2k ln n + ln n, which is known to be roughly optimal in the limit.\nAbernethy et al [1] provide the minimax optimal algorithm, but state the bound in terms of an\nexpected length of a random walk. This is essentially equivalent to our description of the minimax\ncost in terms of \u03a6(\u2205).\nA signi\ufb01cant drawback of the Hedge result, however, is that it requires the losses to be uniformly\nbounded in [0, 1], that is \u2018t \u2208 [0, 1]n. Ideally, we would like an algorithm and a bound that can handle\nnon-uniform cost ranges, i.e. where expert i suffers loss in some range [0, ci]. The \u2018t,i \u2208 [0, 1]\nassumption is fundamental to the Hedge analysis, and we see no simple way of modifying it to\nachieve a tight bound. The simplest trick, which is just to take cmax := maxi ci, leads to a bound of\nthe form k +\n2cmaxk ln n + cmax ln n which we know to be very loose. Intuitively, this is because\nonly a single \u201crisky\u201d expert, with a large ci, should not affect the bound signi\ufb01cantly.\nletting M =\nIn our Budgeted Adversary framework,\ndiag(c1, . . . , cn) and cost(s) = mini sici gives us immediately an optimal algorithm that, by Theo-\nrem 4.1, we know to be minimax optimal. According to the same theorem, the minimax loss bound\nis simply \u03a6(\u2205) which, unfortunately, is in terms of a random walk length. We do not know how to\nobtain a closed form estimate of this expression, and we leave this as an intriguing open question.\n\nthis case can be dealt with trivially:\n\n\u221a\n\n6 Metrical Task Systems\n\nA classic problem from the Online Algorithms community is known as Metrical Task Systems\n(MTS), which we now describe. A player (decision-maker, algorithm, etc.) is presented with a\n\ufb01nite metric space and on each of a sequence of rounds will occupy a single state (or point) within\nthis metric space. At the beginning of each round the player is presented with a cost vector, describ-\ning the cost of occupying each point in the metric space. The player has the option to remain at the\nhis present state and pay this states associated cost, or he can decide to switch to another point in\nthe metric and pay the cost of the new state. In the latter case, however, the player must also pay the\nswitching cost which is exactly the metric distance between the two points.\nThe MTS problem is a useful abstraction for a number of problems; among these is job-scheduling.\nAn algorithm would like to determine on which machine, across a large network, it should process a\njob. At any given time point, the algorithm observes the number of available cycles on each machine,\nand can choose to migrate the job to another machine. Of course, if the subsequent machine is a\ngreat distance, then the algorithm also pays the travel time of the job migration through the network.\nNotice that, were we given a sequence of cost vectors in advance, we could compute the optimal path\nof the algorithm that minimized total cost. Indeed, this is ef\ufb01ciently solved by dynamic program-\nming, and we will refer to this as the optimal of\ufb02ine cost, or just the of\ufb02ine cost. What we would\nlike is an algorithm that performs well relative to the of\ufb02ine cost without knowledge of the sequence\nof cost vectors. The standard measure of performance for an online algorithm is the competitive\nratio, which is the ratio of cost of the online algorithm to the optimal of\ufb02ine cost. For all the results\ndiscussed below, we assume that the online algorithm can maintain a randomized state\u2014a distri-\nbution over the metric\u2014and pays the expected cost according to this random choice (Randomized\nalgorithms tend to exhibit much better competitive ratios than deterministic algorithms).\nWhen the metric is uniform, i.e. where all pairs of points are at unit distance, it is known that\nthe competitive ratio is O(log n), where n is the number of points in the metric; this was shown\nby Borodin, Linial and Saks who introduced the problem [4]. For general metric spaces, Bartal et\nal achieved a competitive ratio of O(log6 n) [3], and this was improved to O(log2 n) by Fiat and\nMendel [9]. The latter two techniques, however, rely on a scheme of randomly approximating the\nmetric space with a hierarchical tree metric, adding a (likely-unnecessary) multiplicative cost factor\nof log n. It is widely believed that the minimax competitive ratio is O(log n) in general, but this gap\nhas remained elusive for at least 10 years.\nThe most signi\ufb01cant progress towards O(log n) is the 2007 work of Bansal et al [2] who achieved\nsuch a ratio for the case of \u201cweighted-star metrics\u201d. A weighted star is a metric such that each point\ni has a \ufb01xed distance di from some \u201ccenter state\u201d, and traveling between any state i and j requires\n\n6\n\n\fgoing through the center, hence incurring a switching cost of di + dj. For weighted-star metrics,\nBansal et al managed to justify two simpli\ufb01cations which are quite useful:\n\n1. We can assume that the cost vector is of the form h0, . . . ,\u221e, . . . , 0i; that is, all state receive\n\n0 cost, except some state i which receives an in\ufb01nite cost.\n\n2. When the online algorithm is currently maintaining a distribution w over the metric, and an\nin\ufb01nite cost occurs at state i, we can assume1 that algorithm incurs exactly 2diwi, exactly\nthe cost of having wi probability weight enter and leave i from the center.\n\nBansal et al provide an ef\ufb01cient algorithm for this setting using primal-dual techniques developed\nfor solving linear programs. With the methods developed in the present paper, however, we can give\nthe minimax optimal online algorithm under the above simpli\ufb01cations. Notice that the adversary is\nnow choosing a sequence of states i1, i2, i3 . . . \u2208 [n] at which to assign an in\ufb01nite cost. If we let\n\u03c1 = i1i2i3 . . ., then the online algorithm\u2019s job is to choose a sequence of distributions w(\u03c1t), and\npays 2dit+1wit+1(\u03c1t) at each step. In the end, the online algorithm\u2019s cost is compared to the of\ufb02ine\nMTS cost of \u03c1, which we will call cost(\u03c1). Assume2 we know the cost of the of\ufb02ine in advance, say\nit\u2019s k, and let us de\ufb01ne M = diag(2d1, . . . , 2dn). Then the player\u2019s job is to select an algorithm w\nwhich minimizes\n\nTX\n\nmax\n\n\u03c1 = (i1, . . . , iT )\n\ncost(\u03c1) \u2264 k\n\nw(\u03c1t\u22121)>Meit.\n\nalgorithm is precisely lim supk\u2192\u221e(cid:0) 1\n\nt=1\n\nk MinimaxLoss(M, cost, k)(cid:1). Notice the convenient trick here:\n\nAs we have shown, Algorithm 2 is minimax optimal for this setting. The competitive ratio of this\n\nby bounding a priori the cost of the of\ufb02ine at k, we can simply imagine playing this repeated game\nuntil the budget k is achieved. Then the competitive ratio is just the worst-case loss over the of\ufb02ine\ncost, k. On the downside, we don\u2019t know of any easy way to bound the worst-case loss \u03a6(\u2205).\n\n7 Combinatorial Information Markets\n\nWe now consider the design of so-called cost-function-based information markets, a popular type\nof prediction market. This work is well-developed by Chen and Pennock [7], with much useful\ndiscussion by Chen and Vaughn [8]. We refer the reader to the latter work, which provides a very\nclear picture of the nice relationship between online learning and the design of information markets.\nIn the simplest setting, a prediction market is a mechanism for selling n types of contract, where\na contract of type i corresponds to some potential future outcome, say \u201cevent i will occur\u201d. The\nstandard assumption is that the set of possible outcomes are mutually exclusive, so only one of the\nn events will occur\u2014for example, a pending election with n competing candidates and one eventual\nwinner. When a bettor purchases a contract of type i, the manager of the market, or \u201cmarket maker\u201d,\npromises to pay out $1 if the outcome is i and $0 otherwise.\nA popular research question in recent years is how to design such prediction markets when the out-\ncome has a combinatorial structure. An election might produce a complex outcome like a group\nof candidates winning, and a bettor may desire to bet on a complex predicate, such as \u201cnone of\nthe winning candidates will be from my state\u201d. This question is explored in Hanson [13], although\nwithout much discussion of the relevant computational issues. The computational aspects of com-\nbinatorial information markets are addressed in Chen et al [6], who provide a particular hardness\nresult regarding computation of certain price functions, as well as a positive result for an alternative\ntype of combinatorial market. In the present section, we propose a new technique for designing\ncombinatorial markets using the techniques laid out in the present work.\nIn this type of information market, the task of a market maker is to choose a price for each of\nthe n contracts, but where the prices may be set adaptively according to the present demand. Let\ns \u2208 Nn denote the current volume, where si is the number of contracts sold of type i. In a cost-\nfunction-based market, these prices are set according to a given convex \u201ccost function\u201d C(s) which\n\n1Precisely, they claim that it should be upper-bounded by 4di. We omit the details regarding this issue, but\n\nit only contributes a multiplicative factor of 2 to the competitive ratio.\n\n2Even when we do not know the of\ufb02ine cost in advance, standard \u201cdoubling tricks\u201d allow you to guess this\n\nvalue and increase the guess as the game proceeds. For space, we omit these details.\n\n7\n\n\fb logPn\n\ni1i2 . . . is purchased, the market maker\u2019s total earning isP\n\nrepresents a potential on the demand. It is assumed that C(\u00b7) satis\ufb01es the relation C(s + \u03b11) =\nC(s) + \u03b1 for all s, and \u03b1 > 0 and \u22022C\n> 0. A typical example of such a cost function is C(s) =\n\u2202s2\ni\ni=1 exp(si/b) where b is a parameter (see Chen and Pennock for further discussion [7]); it\u2019s\neasy to check this function satis\ufb01es the desired properties.\nGiven the current volume s, the price of contract i is set at C(s + ei) \u2212 C(s). This pricing scheme\nhas the advantage that the total money earned in this market is easy to compute: it\u2019s exactly C(s)\nregardless of the order in which the contracts were purchased. A disadvantage of this market, how-\never, is that the posted prices (typically) sum to greater than $1! A primary goal of an information\nmarket is to incentivize bettors to reveal their private knowledge of the outcome of an event. If a\ngiven bettor believes the true distribution of the outcome to be q \u2208 \u2206n, he will have an incentive to\npurchase any contract i for which the current price pi is smaller than qi, thus providing positive ex-\npected reward (relative to his predicted distribution). Using this cost-function scheme, it is possible\nthat qi < C(s + ei) \u2212 C(s) for all i and hence a bettor will have no incentive to bet.\nWe propose instead an alternative market mechanism that avoids this dif\ufb01culty: for every given\nvolume state s, the market maker will advertise a price vector w(s) \u2208 \u2206n. If a contract of type i is\npurchased, the state proceeds to s + ei, and the market maker earns wi(s). If a sequence of contracts\nt w(ei1 + . . . + eit\u22121) \u00b7 eit. On the\nother hand, if the \ufb01nal demand is s, in the worst case the market maker may have to payout a total of\nmaxi si dollars. If we assume the market maker has a \ufb01xed budget k on the max number of contracts\nhe is willing to sell, and wants to maximize the total earned money from selling contracts subject to\nthis constraint, then we have3 exactly a Budgeted Adversary problem: let M be the identity and let\ncost(s) := maxi si.\nThis looks quite similar to the Budgeted Adversary reduction in the Hedge Setting described above,\nwhich is perhaps not too surprising given the strong connections discovered in Chen and Vaughn [8]\nbetween learning with experts and market design. But this reduction gives us additional power: we\nnow have a natural way to design combinatorial prediction markets. We sketch one such example,\nbut we note that many more can be worked out also.\nAssume we are in a setting where we have n election candidates, but some subset of size m < n will\nbecome the \u201cwinners\u201d, and any such subset is possible. In this case, we can imagine a market maker\nselling a contract of type i with the following promise: if candidate i is in the winning subset, the\npayout is 1/m and 0 otherwise. For similar reasons as above, the market maker should sell contracts\ni pi = 1. If we assume that market maker has a budget constraint of k for\nthe \ufb01nal payout, then we can handle this new setting within the Budgeted Adversary framework by\nsimply modifying the cost function appropriately:\n\nat prices pi whereP\n\ncost(s) =\n\nmax\n\nU\u2282[n],|U|=m\n\nX\n\ni\u2208U\n\nsi\nm\n\n.\n\nThis solution looks quite simple, so what did we gain? The bene\ufb01t of our Budgeted Adversary\nframework is that we can handle arbitrary monotonic budget constraints, and the combinatorial\nnature of this problem can be encoded within the budget. We showed this for the case of \u201csubset\nbetting\u201d, but it can be applied to a wide range of settings with combinatorial outcomes.\n\n8 Open problem\n\nWe have provided a very general framework for solving repeated zero-sum games against a budgeted\nadversary. Unfortunately, the generality of these results only go as far as games with payoff matrices\nthat are inverse-nonnegative. For one-shot games, of course, Von Neumann\u2019s minimax theorem leads\nus to an ef\ufb01cient algorithm, i.e. linear programming, which can handle any payoff matrix, and we\nwould hope this is achievable here. We thus pose the following open question: Is there an ef\ufb01cient\nalgorithm for solving Budgeted Adversary games for arbitrary matrices M?\n\n3The careful reader may notice that this modi\ufb01ed model may lead to a problem not present in the cost-\nfunction based markets: an arbitrage opportunity for the bettors. This issue can be dealt with by including a\nsuf\ufb01cient transaction fee per contract, but we omit these details due to space constraints.\n\n8\n\n\fReferences\n[1] J. Abernethy, M. K. Warmuth, and J. Yellin. Optimal strategies from random walks. In Pro-\nceedings of the 21st Annual Conference on Learning Theory (COLT 08), pages 437\u2013445, July\n2008.\n\n[2] Nikhil Bansal, Niv Buchbinder, and Joseph (Sef\ufb01) Naor. A Primal-Dual randomized algorithm\nfor weighted paging. In Proceedings of the 48th Annual IEEE Symposium on Foundations of\nComputer Science, pages 507\u2013517. IEEE Computer Society, 2007.\n\n[3] Y. Bartal, A. Blum, C. Burch, and A. Tomkins. A polylog (n)-competitive algorithm for met-\nrical task systems. In Proceedings of the twenty-ninth annual ACM symposium on Theory of\ncomputing, page 711719, 1997.\n\n[4] A. Borodin, N. Linial, and M. E Saks. An optimal on-line algorithm for metrical task system.\n\nJournal of the ACM (JACM), 39(4):745763, 1992.\n\n[5] B. Br\u00a8ugmann. Monte carlo go. Master\u2019s Thesis, Unpublished, 1993.\n\n[6] Y. Chen, L. Fortnow, N. Lambert, D. M Pennock, and J. Wortman. Complexity of combina-\ntorial market makers. In Proceedings of the ACM Conference on Electronic Commerce (EC),\n2008.\n\n[7] Y. Chen and D. M Pennock. A utility framework for bounded-loss market makers. In Proceed-\n\nings of the 23rd Conference on Uncertainty in Arti\ufb01cial Intelligence, page 4956, 2007.\n\n[8] Y. Chen and J. W Vaughan. A new understanding of prediction markets via No-Regret learning.\n\nArxiv preprint arXiv:1003.0034, 2010.\n\n[9] A. Fiat and M. Mendel. Better algorithms for unfair metrical task systems and applications.\nIn Proceedings of the thirty-second annual ACM symposium on Theory of computing, page\n725734, 2000.\n\n[10] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning\nand an application to Boosting. J. Comput. Syst. Sci., 55(1):119\u2013139, 1997. Special Issue for\nEuroCOLT \u201995.\n\n[11] S. Gelly and D. Silver. Combining online and of\ufb02ine knowledge in UCT. In Proceedings of\n\nthe 24th international conference on Machine learning, page 280, 2007.\n\n[12] S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modi\ufb01cation of UCT with patterns in Monte-\n\nCarlo go. 2006.\n\n[13] R. Hanson. Combinatorial information market design.\n\n5(1):107119, 2003.\n\nInformation Systems Frontiers,\n\n[14] N. Littlestone and M. K. Warmuth. The Weighted Majority algorithm.\n\n108(2):212\u2013261, 1994. Preliminary version in FOCS 89.\n\nInform. Comput.,\n\n9\n\n\f", "award": [], "sourceid": 1173, "authors": [{"given_name": "Jacob", "family_name": "Abernethy", "institution": null}, {"given_name": "Manfred K.", "family_name": "Warmuth", "institution": null}]}