{"title": "Stochastic convex optimization with bandit feedback", "book": "Advances in Neural Information Processing Systems", "page_first": 1035, "page_last": 1043, "abstract": "This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \\in X$. We demonstrate a generalization of the ellipsoid algorithm that incurs $O(\\poly(d)\\sqrt{T})$ regret. Since any algorithm has regret at least $\\Omega(\\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.", "full_text": "Stochastic convex optimization with bandit\n\nfeedback\n\nAlekh Agarwal\n\nDepartment of EECS\n\nUC Berkeley\n\nDean P. Foster\n\nDepartment of Statistics\n\nUniversity of Pennysylvania\n\nDaniel Hsu\n\nMicrosoft Research\n\nNew England\n\nalekh@cs.berkeley.edu\n\ndean.foster@gmail.com\n\ndahsu@microsoft.com\n\nSham M. Kakade\n\nDepartment of Statistics Microsoft Research\n\nUniversity of Pennysylvania\n\nNew England\n\nskakade@microsoft.com\n\nAlexander Rakhlin\n\nDepartment of Statistics\n\nUniversity of Pennysylvania\nrakhlin@wharton.upenn.edu\n\nAbstract\n\nThis paper addresses the problem of minimizing a convex, Lipschitz func-\ntion f over a convex, compact set X under a stochastic bandit feedback\nmodel. In this model, the algorithm is allowed to observe noisy realizations\nof the function value f (x) at any query point x \u2208 X . We demonstrate\ngret. Since any algorithm has regret at least \u2126(\u221aT ) on this problem, our\nalgorithm is optimal in terms of the scaling with T .\n\na generalization of the ellipsoid algorithm that incurs (cid:101)O(poly(d)\u221aT ) re-\n\n1 Introduction\n\nThis paper considers the problem of stochastic convex optimization under bandit feedback\nwhich is a generalization of the classical multi-armed bandit problem, formulated by Robbins\nin 1952. Our problem is speci\ufb01ed by a mean cost function f which is assumed to be convex\nand Lipschitz, and a convex, compact domain X . The algorithm repeatedly queries f at\npoints x \u2208 X and observes noisy realizations of f (x). Performance of an algorithm is\nmeasured by regret, that is the di\ufb00erence between values of f at the query points and the\nminimum value of f over X . This specializes to the classical K-armed setting when X is\nthe probability simplex and f is linear. Several recent works consider the continuum-armed\nbandit problem, making di\ufb00erent assumptions on the structure of f over X . For instance,\nthe f is assumed to be linear in the paper [9], a Lipschitz condition on f is assumed in\nthe works [3, 12, 13], and Srinivas et al. [16] exploit the structure of a Gaussian processes.\nFor these \u201cnon-parametric\u201d bandit problems, the rates of regret (after T queries) are of the\nform T \u03b1, with exponent \u03b1 approaching 1 for large dimension d.\n\nThe question addressed in the present paper is: How can we leverage convexity of the mean\ncost function as a structural assumption? Our main contribution is an algorithm which\n\nachieves, with high probability, an \u02dcO(poly(d)\u221aT ) regret after T requests. This result holds\nfor all convex Lipschitz mean cost functions. We observe that the rate with respect to T\ndoes not deteriorate with d unlike the non-parametric problems mentioned earlier. Let us\nalso remark that \u2126(\u221adT ) lower bounds have been shown for linear mean cost functions,\nmaking our algorithm optimal up to factors polynomial in d and logarithmic in T .\nPrior Work Asymptotic rates of \u221aT have been previously achieved by Cope [8] for uni-\nmodal functions under stringent conditions (smoothness and strong convexity of the mean\n\n1\n\n\fcost function, in addition to the maxima being achieved inside the set). Auer et al. [4]\n\nshow a regret of \u02dcO(\u221aT ) for a one-dimensional non-convex problem with \ufb01nite number of\nmaximizers. Yu and Mannor [17] recently studied unimodal bandits in one dimension, but\nthey do not consider higher dimensional cases. Bubeck et al. [7] show \u221aT regret for a subset\nof Lipschitz functions with certain metric properties. Convex, Lipschitz cost functions have\nalso been looked at in the adversarial model [10, 12], but the best-known regret bounds for\nthese algorithms are O(T 3/4). We also note that previous results of Agarwal et al. [1] and\nNesterov [15] do not apply to our setting as noted in the full-length version of this paper [2].\n\nThe problem addressed in this paper is closely related to noisy zeroth order convex optimiza-\ntion, whereby the algorithm queries a point of the domain X and receives a noisy evaluation\nof the function. While the literature on stochastic optimization is vast, we emphasize that\nan optimization guarantee does not necessarily imply a bound on regret.\nIn particular,\nwe directly build on an optimization method that has been developed by Nemirovski and\nYudin [14, Chapter 9]. Given \u0001 > 0, the method is guaranteed to produce an \u0001-minimizer\n\nin (cid:101)O(poly(d)\u0001\u22122) iterations, yet this does not immediately imply small regret. The latter is\n\nthe quantity of interest in this paper, since it is the standard performance measure in de-\ncision theory. More importantly, in many applications every query to the function involves\na consumption of resources or a monetary cost. A low regret guarantees that the net cost\nover the entire process is bounded unlike an optimization error bound.\n\nThe remainder of this paper is organized as follows. In the next section, we give the formal\nproblem setup and highlight di\ufb00erences between the regret and optimization error settings.\nWe then present a simple algorithm and its analysis for 1-dimension that illustrates some of\nthe key insights behind the general d-dimensional algorithm in Section 3. Section 4 describes\nour generalization of the ellipsoid algorithm for d dimensions along with its regret guarantee.\nProofs of our results can be found in the full-length version [2].\n\n2 Setup\n\nIn this section we will give the basic setup and the performance criterion, and explain the\ndi\ufb00erences between the metrics of regret and optimization error.\n\n2.1 Problem de\ufb01nition and notation\n\nLet X be a compact and convex subset of Rd, and let f : X \u2192 R be a 1-Lipschitz convex\nfunction on X , so f (x) \u2212 f (x(cid:48)) \u2264 (cid:107)x \u2212 x(cid:48)(cid:107) for all x, x(cid:48) \u2208 X . We assume X is speci\ufb01ed in\na way so that the algorithm can e\ufb03ciently construct the smallest Euclidian ball containing\nX . Furthermore, we assume the algorithm has noisy black-box access to f . Speci\ufb01cally, the\nalgorithm is allowed to query the value of f at any x \u2208 X , and it observes y = f (x)+\u03b5 where \u03b5\nis an independent \u03c3-subgaussian random variable with mean zero: E[exp(\u03bb\u03b5)] \u2264 exp(\u03bb2\u03c32/2)\nfor all \u03bb \u2208 R. The goal of the algorithm is to minimize its regret: after making T queries\nx1, . . . , xT \u2208 X , the regret of the algorithm compared to any x\u2217 \u2208 arg minx\u2208X f (x) is\n\n(1)\n\nRT =(cid:80)T\n\nt=1\n\n(cid:2)f (xt) \u2212 f (x\u2217)(cid:3).\n\nWe will construct an average and con\ufb01dence interval (henceforth CI) for the function values\nat points queried by the algorithm. Letting LB\u03b3i(x) and UB\u03b3i(x) denote the lower and\nupper bounds of a CI of width \u03b3i for the function estimate of a point x, we will say that\nCI\u2019s at two points are \u03b3-separated if LB\u03b3i(x) \u2265 UB\u03b3i(y) + \u03b3 or LB\u03b3i (y) \u2265 UB\u03b3i (x) + \u03b3.\n2.2 Regret vs. optimization error\n\n(cid:80)T\nSince f is convex, the average \u00afxT = 1\nt=1 xt satis\ufb01es f (\u00afxT ) \u2212 f (x\u2217) \u2264 RT /T so that low\nT\nregret (1) also gives a small optimization error. The converse, however, is not necessarily\ntrue. An optimization method might can query far from the minimum of the function (that\nis, explore) on most rounds, and output the solution at the last step. Guaranteeing a small\nregret typically involves a more careful balancing of exploration and exploitation.\n\n2\n\n\fTo better understand the di\ufb00erence, suppose X = [0, 1], and let f (x) be one of\nxT \u22121/3,\u2212xT \u22121/3 and x(x \u2212 1). Let us sample function values at x = 1/4 and x = 3/4.\nTo distinguish the \ufb01rst two cases, we need \u2126(T 2/3) points. If f is linear indeed, we only\nincur O(T 1/3) regret on these rounds. However, if instead f (x) = x(x \u2212 1), we incur an\nundesirable \u2126(T 2/3) regret. For purposes of optimization, it su\ufb03ces to eventually distin-\nguish the three cases. For the purposes of regret minimization, however, an algorithm has\nto detect that the function curves between the two sampled points. To address this issue,\nwe additionally sample at x = 1/2. The center point acts as a sentinel : if it is recognized\nthat f (1/2) is noticeably below the other two values, the region [0, 1/4] or [3/4, 1] can be\ndiscarded. Similarly, one of these regions can be discarded if it is recognized that the value\nof f either at x = 1/4 or at x = 3/4 is greater than others. Finally, if f at all three points\nappears to be similar at a given scale, we have a certi\ufb01cate (due to convexity) that the\nalgorithm is not paying regret per query larger than this scale.\n\nThis center-point device that allows to quickly detect that the optimization method might\nbe paying high regret and to act on this information is the main novel tool of our paper.\nUnlike discretization-based methods, the proposed algorithm uses convexity in a crucial way.\nWe \ufb01rst demonstrate the device on one-dimensional problems in the next section, where the\nsolution is clean and intuitive. We then develop a version of the algorithm for higher\ndimensions, basing our construction on the beautiful zeroth order optimization method of\nNemirovski and Yudin [14]. Their method does not guarantee vanishing regret by itself, and\na careful fusion of this algorithm with our center-point device is required.\n\n3 One-dimensional case\n\nWe start with a special case of 1-dimension to illustrate some of the key ideas including\nthe center-point device. We assume wlog that the domain X = [0, 1], and f (x) \u2208 [0, 1] (the\nlatter can be achieved by pinning f (x\u2217) = 0 since f is 1-Lipschitz).\n\n3.1 Algorithm description\n\nAlgorithm 1 One-dimensional stochastic convex bandit algorithm\n\ninput noisy black-box access to f : [0, 1] \u2192 R, total number of queries allowed T .\n\n1: Let l1 := 0 and r1 := 1.\n2: for epoch \u03c4 = 1, 2, . . . do\n3:\n4:\n5:\n6:\n7:\n\nLet w\u03c4 := r\u03c4 \u2212 l\u03c4 .\nLet xl := l\u03c4 + w\u03c4 /4, xc := l\u03c4 + w\u03c4 /2, and xr := l\u03c4 + 3w\u03c4 /4.\nfor round i = 1, 2, . . . do\n\nLet \u03b3i := 2\u2212i.\nFor each x \u2208 {xl, xc, xr}, query f (x) 2\u03c3\nif max{LB\u03b3i (xl), LB\u03b3i (xr)} \u2265 min{UB\u03b3i(xl), UB\u03b3i(xr)} + \u03b3i then\n\nlog T times.\n\n\u03b32\ni\n\n{Case 1: CI\u2019s at xl and xr are \u03b3i separated}\n\nif LB\u03b3i(xl) \u2265 LB\u03b3i(xr) then let l\u03c4 +1 := xl and r\u03c4 +1 := r\u03c4 .\nif LB\u03b3i(xl) < LB\u03b3i(xr) then let l\u03c4 +1 := l\u03c4 and r\u03c4 +1 := xr.\nContinue to epoch \u03c4 + 1.\n\nelse if max{LB\u03b3i(xl), LB\u03b3i(xr)} \u2265 UB\u03b3i(xc) + \u03b3i then\n\nif LB\u03b3i(xl) \u2265 LB\u03b3i(xr) then let l\u03c4 +1 := xl and r\u03c4 +1 := r\u03c4 .\nif LB\u03b3i(xl) < LB\u03b3i(xr) then let l\u03c4 +1 := l\u03c4 and r\u03c4 +1 := xr.\nContinue to epoch \u03c4 + 1.\n\n{Case 2: CI\u2019s at xc and xl or xr are \u03b3i separated}\n\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18:\n19:\n20: end for\n\nend if\nend for\n\nAlgorithm 1 proceeds in a series of epochs demarcated by a working feasible region (the\ninterval X\u03c4 = [l\u03c4 , r\u03c4 ] in epoch \u03c4 ). In each epoch, the algorithm aims to discard a portion\nof X\u03c4 determined to only contain suboptimal points. To do this, the algorithm repeatedly\n\n3\n\n\fmakes noisy queries to f at three di\ufb00erent points in X\u03c4 . Each epoch is further subdivided\ninto rounds, where we query the function (2\u03c3 log T )/\u03b32\ni times in round i at each of the\npoints. By Hoe\ufb00ding\u2019s inequality, this implies that we know the function value to within\n\u03b3i with high probability. The value \u03b3i is halved at every round. At the end of an epoch \u03c4 ,\nX\u03c4 is reduced to a subset X\u03c4 +1 = [l\u03c4 +1, r\u03c4 +1] \u2282 [l\u03c4 , r\u03c4 ] of the current region for the next\nepoch \u03c4 + 1, and this reduction is such that the new region is smaller in size by a constant\nfraction. This geometric rate of reduction guarantees that only a small number of epochs\ncan occur before X\u03c4 only contains near-optimal points.\nFor the algorithm to identify a sizable portion of X\u03c4 to discard, the queries in each epoch\nshould be suitably chosen, and the convexity of f must be exploited. To this end, the\nalgorithm makes its queries at three equally-spaced points xl < xc < xr in X\u03c4 (see Section\n4.1 of the full-length version for graphical illustrations of these cases).\n\nCase 1: If the CIs around f (xl) and f (xr) are su\ufb03ciently separated, the algorithm discards\na fourth of [l\u03c4 , r\u03c4 ] (to the left of xl or right of xr) which does not contain x\u2217.\nCase 2:\nIf the above separation fails, the algorithm checks if the CI around f (xc) is\nsu\ufb03ciently below at least one of the other CIs (for f (xl) or f (xr)). If that happens, the\nalgorithm again discards a quartile of [l\u03c4 , r\u03c4 ] that does not contain x\u2217.\nCase 3: Finally, if none of the earlier cases is true, then the algorithm is assured (by\nconvexity) that the function is su\ufb03ciently \ufb02at on X\u03c4 and hence it has not incurred much\nregret so far . The algorithm continues the epoch, with an increased number of queries to\nobtain smaller con\ufb01dence intervals at each of the three points.\n\n3.2 Analysis\n\nThe analysis of Algorithm 1 relies on the function values being contained in the con\ufb01-\ndence intervals we construct at each round of each epoch. To avoid having probabilities\nthroughout our analysis, we de\ufb01ne an event E where at each epoch \u03c4 , and each round i,\nf (x) \u2208 [LB\u03b3i(x), UB\u03b3i(x)] for x \u2208 {xl, xc, xr}. We will carry out the remainder of the\nanalysis conditioned on E and bound the probability of E c at the end.\nThe following theorem bounds the regret incurred by Algorithm 1. We note that the regret\nwould be maintained in terms of the points xt queried by the algorithm at time t. Within\nany given round, the order of queries is immaterial to the regret.\n\nTheorem 1 (Regret bound for Algorithm 1). Suppose Algorithm 1 is run on a convex,\n1-Lipschitz function f bounded in [0,1]. Suppose the noise in observations is i.i.d. and\n\u03c3-subGaussian. Then with probability at least 1 \u2212 1/T we have\n\n(cid:112)\n\nf (xt) \u2212 f (x\u2217) \u2264 108\n\n\u03c3T log T log4/3\n\nT(cid:88)\n\nt=1\n\n(cid:18)\n\nT\n\n8\u03c3 log T\n\n(cid:19)\n\n.\n\nRemarks: As stated Algorithm 1 and Theorem 1 assume knowledge of T , but we can make\n\nthe algorithm adaptive to T by a standard doubling argument. We remark that O(\u221aT ) is\n\nthe smallest possible regret for any algorithm even with noisy gradient information. Hence,\nthis result shows that for purposes of regret, noisy zeroth order information is no worse than\nnoisy \ufb01rst-order information apart from logarithmic factors.\n\nTheorem 1 is proved via a series of lemmas below. The key idea is to show that the regret\non any epoch is small and the total number of epochs is bounded. To bound the per-epoch\nregret, we will show that the total number of queries made on any epoch depends on how\n\ufb02at the function is on X\u03c4 . We either take a long time, but the function is very \ufb02at, or we\nstop early when the function has su\ufb03cient slope, never accruing too much regret. We start\nby showing that the reduction in X\u03c4 after each epoch always preserves near-optimal points.\nLemma 1 (Survival of approx. minima). If epoch \u03c4 ends in round i, then [l\u03c4 +1, r\u03c4 +1]\ncontains every x \u2208 [l\u03c4 , r\u03c4 ] such that f (x) \u2264 f (x\u2217) + \u03b3i. In particular, x\u2217 \u2208 [l\u03c4 , r\u03c4 ] for all \u03c4 .\n\n4\n\n\fThe next two lemmas bound the regret incurred in any single epoch. To show this, we \ufb01rst\nestablish that an algorithm incurs low regret in a round as long as it does not end an epoch.\nThen, as a consequence of the doubling trick, we show that the regret incurred in an epoch\nis on the same order as that incurred in the last round of the epoch.\nLemma 2 (Certi\ufb01cate of low regret). If epoch \u03c4 continues from round i to round i + 1, then\nthe regret incurred in round i is at most 72\u03b3\u22121\nLemma 3 (Regret in an epoch). If epoch \u03c4 ends in round i, then the regret incurred in the\nentire epoch is at most 216\u03b3\u22121\n\ni \u03c3 log T.\n\ni \u03c3 log T.\n\nTo obtain a bound on the overall regret, we bound the number of epochs that can occur\nbefore X\u03c4 only contains near-optimal points. The \ufb01nal regret bound is simply the product\nof the number of epochs and the regret incurred in any single epoch.\nLemma 4 (Bound on the number of epochs). The total number of epochs \u03c4 performed by\nAlgorithm 1 is bounded as \u03c4 \u2264 1\n\n2 log4/3\n\n(cid:16)\n\n(cid:17)\n\n.\n\nT\n\n8\u03c3 log T\n\n4 Algorithm for optimization in higher dimensions\n\nWe now move to present the general algorithm that works in d-dimensions. The natural\napproach would be to try and generalize Algorithm 1 to work in multiple dimensions. How-\never, the obvious extension requires querying the function along every direction in a covering\nof the unit sphere so that we know the behavior of the function along every direction. Such\nan approach yields regret and running time that scales exponentially with the dimension d.\nNemirovski and Yudin [14] address this problem in the setup of zeroth order optimization\nby a clever construction to capture all the directions in polynomially many queries. We\nde\ufb01ne a pyramid to be a d-dimensional polyhedron de\ufb01ned by d + 1 points; d points form\na d-dimensional regular polygon that is the base of the pyramid, and the apex lies above\nthe hyperplane containing the base (see Figure 1 for a graphic illustration in 3 dimensions).\nThe idea of Nemirovski and Yudin is to build a sequence of pyramids, each capturing the\nvariation of function in certain directions, in such a way that with O(d log d) pyramids we\ncan explore all the directions. However, as mentioned earlier, their approach fails to give a\nlow regret. We combine their geometric construction with ideas from the one-dimensional\ncase to obtain Algorithm 2 which incurs a bounded regret.\n\nFigure 1: Pyramid in 3-dimensions\n\nFigure 2: The regular simplex constructed at round\ni of epoch \u03c4 with radius r\u03c4 , center x0 and vertices\nx1, . . . , xd+1.\n\nJust like the 1-dimensional case, Algorithm 2 proceeds in epochs. We start with the opti-\nmization domain X , and at the beginning we set X0 = X . At the beginning of epoch \u03c4 ,\nwe have a current feasible set X\u03c4 which contains at least one approximate optimum of the\nconvex function. The epoch ends with discarding some portion of the set X\u03c4 in such a way\nthat we still retain at least one approximate optimum in the remaining set X\u03c4 +1.\nAt the start of the epoch \u03c4 , we apply an a\ufb03ne transformation to X\u03c4 so that the smallest\nvolume ellipsoid containing it is a Euclidian ball of radius R\u03c4 (denoted as B(R\u03c4 )). We de\ufb01ne\nr\u03c4 = R\u03c4 /c1d for a constant c1 \u2265 1, so that B(r\u03c4 ) \u2286 X\u03c4 (see e.g. Lecture 1, p. 2 [5]). We\nwill use the notation B\u03c4 to refer to the enclosing ball. Within each epoch, the algorithm\nproceeds in several rounds, each round maintaining a value \u03b3i which is successively halved.\n\n5\n\n\u03d5hAPEXx0x2xd+1x1r\u03c4R\u03c4X\u03c4\f\u2206\u03c4 (\u03b3), \u2206\u03c4 (\u03b3) and number of queries T allowed.\n\nAlgorithm 2 Stochastic convex bandit algorithm\ninput feasible region X \u2282 Rd; noisy black-box access to f : X \u2192 R, constants c1 and c2, functions\n1: Let X1 := X .\n2: for epoch \u03c4 = 1, 2, . . . do\n3:\n4:\n5:\n6:\n7:\n\nRound X\u03c4 so B(r\u03c4 ) \u2286 X\u03c4 \u2286 B(R\u03c4 ), R\u03c4 is minimized, and r\u03c4 := R\u03c4 /(c1d). Let B\u03c4 = B(R\u03c4 ).\nConstruct regular simplex with vertices x1, . . . , xd+1 on the surface of B(r\u03c4 ).\nfor round i = 1, 2, . . . do\n\ntimes.\n\nLet \u03b3i := 2\u2212i.\nQuery f at xj for each j = 1, . . . , d + 1 2\u03c3 log T\nLet y1 := arg maxj LB\u03b3i (xj).\nfor k = 1, 2, . . . do\n\n\u03b32\ni\n\nloop\n\ntimes.\n\nConstruct pyramid \u03a0k with apex yk; let z1, . . . , zd be the vertices of the base of \u03a0k and\nz0 be the center of \u03a0k.\n\nLet(cid:98)\u03b3 := 2\u22121.\nQuery f at each of {yk, z0, z1, . . . , zd} 2\u03c3 log T(cid:98)\u03b32\nLet center := z0, apex := yk, top be the vertex v of \u03a0k maximizing LB(cid:98)\u03b3(v),\nif LB(cid:98)\u03b3(top) \u2265 UB(cid:98)\u03b3(bottom) + \u2206\u03c4 ((cid:98)\u03b3) and LB(cid:98)\u03b3(top) \u2265 UB(cid:98)\u03b3(apex) +(cid:98)\u03b3 then\nbottom be the vertex v of \u03a0k minimizing LB(cid:98)\u03b3(v).\n{Case (1a)}\nelse if LB(cid:98)\u03b3(top) \u2265 UB(cid:98)\u03b3(bottom) + \u2206\u03c4 ((cid:98)\u03b3) and LB(cid:98)\u03b3(top) < UB(cid:98)\u03b3(apex) +(cid:98)\u03b3 then\nLet yk+1 := top, and immediately continue to pyramid k + 1.\n{Case (1b)}\nelse if LB(cid:98)\u03b3(top) < UB(cid:98)\u03b3(bottom) + \u2206\u03c4 ((cid:98)\u03b3) and UB(cid:98)\u03b3(center) \u2265 LB(cid:98)\u03b3(bottom) \u2212\nSet (X\u03c4 +1,B(cid:48)\n\u2206\u03c4 ((cid:98)\u03b3) then\nLet(cid:98)\u03b3 :=(cid:98)\u03b3/2.\nif (cid:98)\u03b3 < \u03b3i then start next round i + 1.\nelse if LB(cid:98)\u03b3(top) < UB(cid:98)\u03b3(bottom) + \u2206\u03c4 ((cid:98)\u03b3) and UB(cid:98)\u03b3(center) < LB(cid:98)\u03b3(bottom) \u2212\n\u2206\u03c4 ((cid:98)\u03b3) then\n\n\u03c4 +1) = Cone-cutting(\u03a0k,X\u03c4 ,B\u03c4 ), and proceed to epoch \u03c4 + 1.\n\n{Case (2a)}\n\n\u03c4 +1)= Hat-raising(\u03a0k,X\u03c4 ,B\u03c4 ), and proceed to epoch \u03c4 + 1.\n\n8:\n9:\n10:\n\n11:\n12:\n13:\n14:\n\n15:\n16:\n17:\n18:\n19:\n20:\n21:\n\n22:\n23:\n24:\n25:\n\n{Case (2b)}\nSet (X\u03c4 +1,B(cid:48)\n\nend if\nend loop\n\n26:\n27:\n28:\n29:\n30:\n31:\n32: end for\n\nend for\n\nend for\n\nd(cid:88)\n\nd(cid:88)\n\nAlgorithm 3 Cone-cutting\ninput pyramid \u03a0 with apex y, (rounded) feasible region X\u03c4 for epoch \u03c4 , enclosing ball B\u03c4\n1: Let z1, . . . , zd be the vertices of the base of \u03a0, and \u00af\u03d5 the angle at its apex.\n2: De\ufb01ne the cone\n\nK\u03c4 = {x | \u2203\u03bb > 0, \u03b11, . . . , \u03b1d > 0,\n\n\u03b1i = 1 : x = y \u2212 \u03bb\n\n\u03b1i(zi \u2212 y)}\n\n\u03c4 +1 to be the min. volume ellipsoid containing B\u03c4 \\ K\u03c4 .\n\n3: Set B(cid:48)\n4: Set X\u03c4 +1 = X\u03c4 \u2229 B(cid:48)\noutput new feasible region X\u03c4 +1 and enclosing ellipsoid B(cid:48)\n\n\u03c4 +1.\n\n\u03c4 +1.\n\ni=1\n\ni=1\n\nAlgorithm 4 Hat-raising\ninput pyramid \u03a0 with apex y, (rounded) feasible region X\u03c4 for epoch \u03c4 , enclosing ball B\u03c4 .\n1: Let center be the center of \u03a0.\n2: Set y(cid:48) = y + (y \u2212 center).\n3: Set \u03a0(cid:48) to be the pyramid with apex y(cid:48) and same base as \u03a0.\n4: Set X\u03c4 +1,B(cid:48)\noutput new feasible region X\u03c4 +1 and enclosing ellipsoid B(cid:48)\n\n\u03c4 +1 = Cone-cutting(\u03a0(cid:48) ,X\u03c4 ,B\u03c4 ).\n\n\u03c4 +1.\n\n6\n\n\fFigure 3: Sequence of pyramids constructed by Algorithm 2\n\nLet x0 be the center of the ball B(R\u03c4 ) containing X\u03c4 . At the start of a round i, we construct\na regular simplex centered at x0 and contained in B(r\u03c4 ). The algorithm queries the function\nf at all the vertices of the simplex, denoted by x1. . . . , xd+1, until the CI\u2019s at each vertex\nshrink to \u03b3i. The algorithm picks the point y1 that maximizes LB\u03b3i (xi). By construction,\nf (y1) \u2265 f (xj) \u2212 \u03b3i for all j = 1, . . . , d + 1. This step is depicted in Figure 2.\nThe algorithm now successively constructs a sequence of pyramids, with the goal of iden-\ntifying a region of the feasible set X\u03c4 such that at least one approximate optimum of f\nlies outside the selected region. This region will be discarded at the end of the epoch.\nThe construction of the pyramids follows the construction from Section 9.2.2 of Nemirovski\nand Yudin [14]. The pyramids we construct will have an angle 2\u03d5 at the apex, where\ncos \u03d5 = c2/d. The base of the pyramid consists of vertices z1, . . . , zd such that zi \u2212 x0 and\ny1\u2212 zi are orthogonal. We note that the construction of such a pyramid is always possible\u2014\nwe take a sphere with y1 \u2212 x0 as the diameter, and arrange z1, . . . , zd on the boundary of\nthe sphere such that the angle between y1 \u2212 x0 and y1 \u2212 zi is \u03d5. The construction of the\n\npyramid is depicted in Figure 3. Given this pyramid, we set(cid:98)\u03b3 = 1, and sample the function\nat y1 and z1, . . . , zd as well as the center of the pyramid until the CI\u2019s all shrink to(cid:98)\u03b3. Let\n\ntop and bottom denote the vertices of the pyramid (including y1) with the largest and the\nsmallest function value estimates resp. For consistency, we will also use apex to denote the\napex y1. We then check for one of the following conditions (see Section 5 of the full-length\nversion [2] for graphical illustrations of these cases):\n\nf (top) \u2265 f (apex) +(cid:98)\u03b3 \u2265 f (apex) + \u03b3i.\n\n(1) If LB(cid:98)\u03b3(top) \u2265 UB(cid:98)\u03b3(bottom) + \u2206\u03c4 ((cid:98)\u03b3), we proceed based on the separation between\n(a) If LB(cid:98)\u03b3(top) \u2265 UB(cid:98)\u03b3(apex) +(cid:98)\u03b3, then we know that with high probability\nIn this case, we set top to be the apex of the next pyramid, reset (cid:98)\u03b3 = 1 and\n(b) If LB(cid:98)\u03b3(top) \u2264 UB(cid:98)\u03b3(apex)+(cid:98)\u03b3, then we know that LB(cid:98)\u03b3(apex) \u2265 UB(cid:98)\u03b3(bottom)+\n\u2206\u03c4 ((cid:98)\u03b3) \u2212 2(cid:98)\u03b3. In this case, we declare the epoch over and pass the current apex to\n(2) If LB(cid:98)\u03b3(top) \u2264 UB(cid:98)\u03b3(bottom) + \u2206\u03c4 ((cid:98)\u03b3), then one of the following happens:\n(a) If UB(cid:98)\u03b3(center) \u2265 LB(cid:98)\u03b3(bottom) \u2212 \u2206\u03c4 ((cid:98)\u03b3), then all of the vertices and the center\nof the pyramid have their function values within a 2\u2206\u03c4 ((cid:98)\u03b3) + 3(cid:98)\u03b3 interval. In this\ncase, we set(cid:98)\u03b3 =(cid:98)\u03b3/2. If this sets(cid:98)\u03b3 < \u03b3i, we start the next round with \u03b3i+1 = \u03b3i/2.\nOtherwise, we continue sampling the current pyramid with the new value of(cid:98)\u03b3.\n(b) If UB(cid:98)\u03b3(center) \u2264 LB(cid:98)\u03b3(bottom)\u2212 \u2206\u03c4 ((cid:98)\u03b3), then we terminate the epoch and pass\n\ncontinue the sampling procedure on the next pyramid.\n\nthe cone-cutting step.\n\nthe center and the current apex to the hat-raising step.\n\nHat-Raising: This step happens when the algorithm enters case 2(b). In this case, we\nwill show that if we move the apex of the pyramid a little from yi to y(cid:48)\ni\u2019s CI is above\nthe top CI while the angle of the new pyramid at y(cid:48)\ni is not much smaller than \u03d5. Letting\ncenteri denote the center of the pyramid, we set y(cid:48)\ni = yi + (yi \u2212 centeri) and denote the\nangle at the apex y(cid:48)\n\ni by 2 \u00af\u03d5. Figure 4 shows the transformation involved in this step.\n\ni, then y(cid:48)\n\n7\n\ntop and apex CI\u2019s.\n\n(2)\n\nx0\u03d5z1z2y1x0y1y2x0y1y2y3\fFigure 4: Transformation of the\npyramid \u03a0 in the hat-raising step.\n\nFigure 5: Cone-cutting step at epoch \u03c4 . Solid circle is\nthe enclosing ball B\u03c4 . Shaded region is the intersection\nof K\u03c4 with B\u03c4 . The dotted ellipsoid is the new enclosing\nellipsoid B(cid:48)\n\n\u03c4 +1.\n\nCone-cutting: This step concludes an epoch. The algorithm gets here either through\ncase 1(b) or through the hat-raising step. In either case, we have a pyramid with an apex y,\nbase z1, . . . , zd and an angle 2 \u00af\u03d5 at the apex, where cos( \u00af\u03d5) \u2264 2c2/d. We now de\ufb01ne a cone\n\nd(cid:88)\n\ni=1\n\nd(cid:88)\n\ni=1\n\nK\u03c4 = {x | \u2203\u03bb > 0, \u03b11, . . . , \u03b1d > 0,\n\n\u03b1i = 1 : x = y \u2212 \u03bb\n\n\u03b1i(zi \u2212 y)}\n\n(3)\n\nwhich is centered at y and a re\ufb02ection of the pyramid around the apex. By construction, the\ncone K\u03c4 has an angle 2 \u00af\u03d5 at its apex. We set B(cid:48)\n\u03c4 +1 to be the ellipsoid of minimum volume\ncontaining B\u03c4 \\ K\u03c4 and de\ufb01ne X\u03c4 +1 = X\u03c4 \u2229 B(cid:48)\n\u03c4 +1. This is illustrated in Figure 5. Finally,\nwe put things back into an isotropic position and B\u03c4 +1 is the ball containing X\u03c4 +1 is in the\nisotropic coordinates, which is just obtained by applying an a\ufb03ne transformation to B(cid:48)\n\u03c4 +1.\nLet us end with a brief discussion regarding the computational aspects of this algorithm.\nClearly, the most computationally intensive steps of this algorithm are cone-cutting and the\nisotropic transformation at the end. However, these are exactly analogous to the classical\n\u03c4 +1 is known in closed form [11]. Fur-\nthermore, the a\ufb03ne transformations needed to the reshape the set can be computed via\nrank-one matrix updates and hence computation of inverses can be done e\ufb03ciently as well\n(see e.g. [11] for the relevant implementation details of the ellipsoid method).\n\nellipsoid method. In particular, the equation for B(cid:48)\n\nThe following theorem states our regret guarantee on the performance of Algorithm 2.\nTheorem 2. Suppose Algorithm 2 is run with c1 \u2265 64, c2 \u2264 1/32 and parameters\n\n\u2206\u03c4 (\u03b3) =\n\nand \u2206\u03c4 (\u03b3) =\n\n+ 5\n\n(cid:18) 6c1d4\n(cid:18) 2d2 log d\n\nc2\n2\n\n+ 3\n\n(cid:19)\n(cid:19)(cid:18) 4d7c1\n\n\u03b3\n\n(cid:18) 6c1d4\n(cid:19)(cid:18) 12c1d4\n\nc2\n2\n\n(cid:19)\n\n\u03b3.\n\n(cid:19)\n\nThen with probability at least 1 \u2212 1/T , the regret incurred by the algorithm is bounded by\n= \u02dcO(d16\u221aT ).\n\n768d3\u03c3\u221aT log2 T\n\nd(d + 1)\n\n+ 11\n\n+ 1\n\n+\n\nc2\n\nc2\n2\n\nc2\n2\n\nc3\n2\n\nRemarks: Theorem 2 is again optimal in the dependence on T . The large dependence on\nd is also seen in Nemirovski and Yudin [14] who obtain a d7 scaling in noiseless case and\nleave it an unspeci\ufb01ed polynomial in the noisy case. Using random walk ideas [6] to improve\nthe dependence on d is an interesting question for future research.\n\nAcknowledgments\n\nPart of this work was done while AA and DH were at the University of Pennsylvania. AA\nwas partially supported by MSR and Google PhD fellowships and NSF grant CCF-1115788\nwhile this work was done. DH was partially supported under grants AFOSR FA9550-09-1-\n0425, NSF IIS-1016061, and NSF IIS-713540. AR gratefully acknowledges the support of\nNSF under grant CAREER DMS-0954737.\n\n8\n\nz1z2yiy\uffffi\u00af\u03d5\u03d5B\u03c4B\uffff\u03c4+1K\u03c4\fReferences\n\n[1] A. Agarwal, O. Dekel, and L. Xiao. Optimal algorithms for online convex optimization\n\nwith multi-point bandit feedback. In COLT, 2010.\n\n[2] A. Agarwal, D. Foster, D. Hsu, S. Kakade, and A. Rakhlin. Stochastic convex opti-\n\nmization with bandit feedback. URL http://arxiv.org/abs/1107.1744, 2011.\n\n[3] R. Agrawal. The continuum-armed bandit problem. SIAM journal on control and\n\noptimization, 33:1926, 1995.\n\n[4] P. Auer, R. Ortner, and C. Szepesv\u00b4ari. Improved rates for the stochastic continuum-\n\narmed bandit problem. Learning Theory, pages 454\u2013468, 2007.\n\n[5] K. Ball. An elementary introduction to modern convex geometry. In Flavors of Ge-\nometry, number 31 in Publications of the Mathematical Sciences Research Institute,\npages 1\u201355. 1997.\n\n[6] D. Bertsimas and S. Vempala. Solving convex programs by random walks. Journal of\n\nthe ACM, 51(4):540\u2013556, 2004.\n\n[7] S. Bubeck, R. Munos, G. Stolz, and C. Szepesv\u00b4ari. X -armed bandits. Journal of\n\nMachine Learning Research, 12:1655\u20131695, 2011.\n\n[8] E.W. Cope. Regret and convergence bounds for a class of continuum-armed bandit\n\nproblems. Automatic Control, IEEE Transactions on, 54(6):1243\u20131253, 2009.\n\n[9] V. Dani, T.P. Hayes, and S.M. Kakade. Stochastic linear optimization under bandit\nfeedback. In Proceedings of the 21st Annual Conference on Learning Theory (COLT),\n2008.\n\n[10] A. D. Flaxman, A. T. Kalai, and B. H. Mcmahan. Online convex optimization in the\nIn Proceedings of the sixteenth\n\nbandit setting: gradient descent without a gradient.\nannual ACM-SIAM symposium on Discrete algorithms, pages 385\u2013394, 2005.\n\n[11] Donald Goldfarb and Michael J. Todd. Modi\ufb01cations and implementation of the ellip-\n\nsoid algorithm for linear programming. Mathematical Programming, 23:1\u201319, 1982.\n\n[12] R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. Advances\n\nin Neural Information Processing Systems, 18, 2005.\n\n[13] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces.\n\nIn\nProceedings of the 40th annual ACM symposium on Theory of computing, pages 681\u2013\n690. ACM, 2008.\n\n[14] A. Nemirovski and D. Yudin. Problem Complexity and Method E\ufb03ciency in Optimiza-\n\ntion. Wiley, New York, 1983.\n\n[15] Y. Nesterov. Random gradient-free minimization of convex functions. Technical Report\n\n2011/1, CORE DP, 2011.\n\n[16] N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Gaussian process optimization in\nthe bandit setting: No regret and experimental design. Arxiv preprint arXiv:0912.3995,\n2009.\n\n[17] J. Y. Yu and S. Mannor. Unimodal bandits. In ICML, 2011.\n\n9\n\n\f", "award": [], "sourceid": 641, "authors": [{"given_name": "Alekh", "family_name": "Agarwal", "institution": null}, {"given_name": "Dean", "family_name": "Foster", "institution": null}, {"given_name": "Daniel", "family_name": "Hsu", "institution": null}, {"given_name": "Sham", "family_name": "Kakade", "institution": null}, {"given_name": "Alexander", "family_name": "Rakhlin", "institution": null}]}