{"title": "Hardness of Online Sleeping Combinatorial Optimization Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 2181, "page_last": 2189, "abstract": "We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round. Specifically, we show that the sleeping versions of these problems are at least as hard as PAC learning DNF expressions, a long standing open problem. We show hardness for the sleeping versions of Online Shortest Paths, Online Minimum Spanning Tree, Online k-Subsets, Online k-Truncated Permutations, Online Minimum Cut, and Online Bipartite Matching. The hardness result for the sleeping version of the Online Shortest Paths problem resolves an open problem presented at COLT 2015 [Koolen et al., 2015].", "full_text": "Hardness of Online Sleeping Combinatorial\n\nOptimization Problems\n\nSatyen Kale\u2217\u2020\nYahoo Research\n\nsatyen@satyenkale.com\n\nChansoo Lee\u2020\n\nUniv. of Michigan, Ann Arbor\n\nchansool@umich.edu\n\nD\u00b4avid P\u00b4al\n\nYahoo Research\n\ndpal@yahoo-inc.com\n\nAbstract\n\nWe show that several online combinatorial optimization problems that admit ef-\n\ufb01cient no-regret algorithms become computationally hard in the sleeping setting\nwhere a subset of actions becomes unavailable in each round. Speci\ufb01cally, we\nshow that the sleeping versions of these problems are at least as hard as PAC learn-\ning DNF expressions, a long standing open problem. We show hardness for the\nsleeping versions of ONLINE SHORTEST PATHS, ONLINE MINIMUM SPANNING\nTREE, ONLINE k-SUBSETS, ONLINE k-TRUNCATED PERMUTATIONS, ONLINE\nMINIMUM CUT, and ONLINE BIPARTITE MATCHING. The hardness result for\nthe sleeping version of the Online Shortest Paths problem resolves an open prob-\nlem presented at COLT 2015 [Koolen et al., 2015].\n\n1\n\nIntroduction\n\nOnline learning is a sequential decision-making problem where learner repeatedly chooses an action\nin response to adversarially chosen losses for the available actions. The goal of the learner is to\nminimize the regret, de\ufb01ned as the difference between the total loss of the algorithm and the loss of\nthe best \ufb01xed action in hindsight. In online combinatorial optimization, the actions are subsets of\na ground set of elements (also called components) with some combinatorial structure. The loss of\nan action is the sum of the losses of its elements. A particular well-studied instance is the ONLINE\nSHORTEST PATH problem [Takimoto and Warmuth, 2003] on a graph, in which the actions are the\npaths between two \ufb01xed vertices and the elements are the edges.\nWe study a sleeping variant of online combinatorial optimization where the adversary not only\nchooses losses but availability of the elements every round. The unavailable elements are called\nsleeping or sabotaged. In ONLINE SABOTAGED SHORTEST PATH problem, for example, the ad-\nversary speci\ufb01es unavailable edges every round, and consequently the learner cannot choose any\npath using those edges. A straightforward application of the sleeping experts algorithm proposed\nby Freund et al. [1997] gives a no-regret learner, but it takes exponential time (in the input graph\nsize) every round. The design of a computationally ef\ufb01cient no-regret algorithm for ONLINE SAB-\nOTAGED SHORTEST PATH problem was presented as an open problem at COLT 2015 by Koolen\net al. [2015].\nIn this paper, we resolve this open problem and prove that ONLINE SABOTAGED SHORTEST PATH\nproblem is computationally hard. Speci\ufb01cally, we show that a polynomial-time low-regret algorithm\nfor this problem implies a polynomial-time algorithm for PAC learning DNF expressions, which is\na long-standing open problem. The best known algorithm for PAC learning DNF expressions on n\n\nvariables has time complexity 2(cid:101)O(n1/3) [Klivans and Servedio, 2001].\n\u2217Current af\ufb01liation: Google Research.\n\u2020This work was done while the authors were at Yahoo Research.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fOur reduction framework (Section 4) in fact shows a general result that any online sleeping com-\nbinatorial optimization problem with two simple structural properties is as hard as PAC learning\nDNF expressions. Leveraging this result, we obtain hardness results for the sleeping variant of well-\nstudied online combinatorial optimization problems for which a polynomial-time no-regret algo-\nrithm exists: ONLINE MINIMUM SPANNING TREE, ONLINE k-SUBSETS, ONLINE k-TRUNCATED\nPERMUTATIONS, ONLINE MINIMUM CUT, and ONLINE BIPARTITE MATCHING (Section 5).\nOur hardness result applies to the worst-case adversary as well as a stochastic adversary, who draws\nan i.i.d. sample every round from a \ufb01xed (but unknown to the learner) joint distribution over avail-\nabilities and losses. This implies that no-regret algorithms would require even stronger restrictions\non the adversary.\n\n1.1 Related Work\n\n\u221a\nOnline Combinatorial Optimization. The standard problem of online linear optimization with\nd actions (Experts setting) admits algorithms with O(d) running time per round and O(\nT log d)\nregret after T rounds [Littlestone and Warmuth, 1994, Freund and Schapire, 1997], which is mini-\nmax optimal [Cesa-Bianchi and Lugosi, 2006, Chapter 2]. A naive application of such algorithms\n\u221a\nto online combinatorial optimization problem (precise de\ufb01nitions to be given momentarily) over a\nground set of d elements will result in exp(O(d)) running time per round and O(\n\u221a\nDespite this, many online combinatorial optimization problems, such as the ones considered in this\npaper, admit algorithms with3 poly(d) running time per round and O(poly(d)\nT ) regret [Takimoto\nand Warmuth, 2003, Kalai and Vempala, 2005, Koolen et al., 2010, Audibert et al., 2013]. In fact,\nKalai and Vempala [2005] shows that the existence of a polynomial-time algorithm for an of\ufb02ine\n\u221a\ncombinatorial problem implies the existence of an algorithm for the corresponding online optimiza-\ntion problem with the same per-round running time and O(poly(d)\n\nT d) regret.\n\nT ) regret.\n\nOnline Sleeping Optimization.\nIn studying online sleeping optimization, three different notions\nof regret have been used: (a) policy regret, (b) ranking regret, and (c) per-action regret, in decreasing\norder of computational hardness to achieve no-regret. Policy regret is the total difference between\nthe loss of the algorithm and the loss of the best policy, which maps a set of available actions and\nthe observed loss sequence to an available action [Neu and Valko, 2014]. Ranking regret is the\ntotal difference between the loss of the algorithm and the loss of the best ranking of actions, which\ncorresponds to a policy that chooses in each round the highest-ranked available action [Kleinberg\net al., 2010, Kanade and Steinke, 2014, Kanade et al., 2009]. Per-action regret is the difference\nbetween the loss of the algorithm and the loss of an action, summed over only the rounds in which\nthe action is available [Freund et al., 1997, Koolen et al., 2015]. Note that policy regret upper bounds\nranking regret, and while ranking regret and per-action regret are generally incomparable, per-action\nregret is usually the smallest of the three notions.\nThe sleeping Experts (also known as Specialists) setting has been extensively studied in the literature\n[Freund et al., 1997, Kanade and Steinke, 2014]. In this paper we focus on the more general online\nsleeping combinatorial optimization problem, and in particular, the per-action notion of regret.\nA summary of known results for online sleeping optimization problems is given in Figure 1. Note\nin particular that an ef\ufb01cient algorithm was known for minimizing per-action regret in the sleeping\nExperts problem [Freund et al., 1997]. We show in this paper that a similar ef\ufb01cient algorithm for\nminimizing per-action regret in online sleeping combinatorial optimization problems cannot exist,\nunless there is an ef\ufb01cient algorithm for learning DNFs. Our reduction technique is closely related to\nthat of Kanade and Steinke [2014], who reduced agnostic learning of disjunctions to ranking regret\nminimization in the sleeping Experts setting.\n\n2 Preliminaries\n\nAn instance of online combinatorial optimization is de\ufb01ned by a ground set U of d elements, and\na decision set D of actions, each of which is a subset of U. In each round t, the online learner is\nrequired to choose an action Vt \u2208 D, while simultaneously an adversary chooses a loss function\n3In this paper, we use the poly(\u00b7) notation to indicate a polynomially bounded function of the arguments.\n\n2\n\n\fRegret notion\n\nPolicy\n\nRanking\n\nPer-action\n\nSleeping Experts\n\nBound\n\u221a\nUpper O(\n\nT log d),\n\nunder\n\nILA\n\n[Kanade et al., 2009]\n\nLower\n\nLower \u2126(poly(d)T 1\u2212\u03b4), under SLA\n[Kanade and Steinke, 2014]\n\n\u221a\nUpper O(\n\nT log d), adversarial setting\n\n[Freund et al., 1997]\n\nLower\n\nT ),\n\nunder\n\nSleeping Combinatorial Opt.\n\u221a\nO(poly(d)\nILA\n[Neu and Valko, 2014, Abbasi-\nYadkori et al., 2013]\n\u2126(poly(d)T 1\u2212\u03b4), under SLA\n[Abbasi-Yadkori et al., 2013]\nT ), under SLA\n\u2126(exp(\u2126(d))\n[Easy construction, omitted]\n\n\u221a\n\n\u2126(poly(d)T 1\u2212\u03b4), under SLA\n[This paper]\n\nFigure 1:\nSummary of known results. Stochastic Losses and Availabilities (SLA) assumption is where\nadversary chooses a joint distribution over loss and availability before the \ufb01rst round, and takes an i.i.d. sample\nevery round. Independent Losses and Availabilities (ILA) assumption is where adversary chooses losses and\navailabilities independently of each other (one of the two may be adversarially chosen; the other one is then\nchosen i.i.d in each round). Policy regret upper bounds ranking regret which in turn upper bounds per-action\nregret for the problems of interest; hence some bounds shown in some cells of the table carry over to other\ncells by implication and are not shown for clarity. The lower bound on ranking regret in online sleeping\ncombinatorial optimization is unconditional and holds for any algorithm, ef\ufb01cient or not. All other lower\nbounds are computational, i.e. for polynomial time algorithms, assuming intractability of certain well-studied\nlearning problems, such as learning DNFs or learning noisy parities.\n\n(cid:96)t : U \u2192 [\u22121, 1]. The loss of any V \u2208 D is given by (with some abuse of notation)\n\n(cid:96)t(V ) := (cid:80)\nRegretT (V ) := (cid:80)T\n\ne\u2208V (cid:96)t(e).\n\nt=1 (cid:96)t(Vt) \u2212 (cid:96)t(V ).\n\nThe learner suffers loss (cid:96)t(Vt) and obtains (cid:96)t as feedback. The regret of the learner with respect to\nan action V \u2208 D is de\ufb01ned to be\n\nWe say that an online optimization algorithm has a regret bound of f (d, T ) if RegretT (V ) \u2264 f (d, T )\nfor all V \u2208 D. We say that the algorithm has no regret if f (d, T ) = poly(d)T 1\u2212\u03b4 for some\n\u03b4 \u2208 (0, 1), and it is computationally ef\ufb01cient if it has a per-round running time of order poly(d, T ).\nWe now de\ufb01ne an instance of the online sleeping combinatorial optimization. In this setting, at the\nstart of each round t, the adversary selects a set of sleeping elements St \u2286 U and reveals it to the\nlearner. De\ufb01ne At = {V \u2208 D | V \u2229 St = \u2205}, the set of awake actions at round t; the remaining\nactions in D, called sleeping actions, are unavailable to the learner for that round. If At is empty,\ni.e., there are no awake actions, then the learner is not required to do anything for that round and the\nround is discarded from computation of the regret.\nFor the rest of the paper, unless noted otherwise, we use per-action regret as our performance mea-\nsure. Per-action regret with respect to V \u2208 D is de\ufb01ned as:\n\nRegretT (V ) :=\n\n(cid:96)t(Vt) \u2212 (cid:96)t(V ).\n\n(1)\n\n(cid:88)\n\nt: V \u2208At\n\nIn other words, our notion of regret considers only the rounds in which V is awake.\nFor clarity, we de\ufb01ne an online combinatorial optimization problem as a family of instances of online\ncombinatorial optimization (and correspondingly for online sleeping combinatorial optimization).\nFor example, ONLINE SHORTEST PATH problem is the family of all instances of all graphs with\ndesignated source and sink vertices, where the decision set D is a set of paths from the source to\nsink, and the elements are edges of the graph.\nOur main result is that many natural online sleeping combinatorial optimization problems are un-\nlikely to admit a computationally ef\ufb01cient no-regret algorithm, although their non-sleeping versions\n(i.e., At = D for all t) do. More precisely, we show that these online sleeping combinatorial op-\ntimization problems are at least as hard as PAC learning DNF expressions, a long-standing open\nproblem.\n\n3\n\n\fOur goal is to design an algorithm that is competitive with any disjunction, i.e. for any disjunction\n\u03c6 over n variables, the regret is bounded by poly(n) \u00b7 T 1\u2212\u03b4 for some \u03b4 \u2208 (0, 1). Recall that a\ndisjunction over n variables is a boolean function \u03c6 : {0, 1}n \u2192 {0, 1} that on an input x =\n(x(1), x(2), . . . , x(n)) outputs\n\nRegretT (\u03c6) =(cid:80)T\nt=1 1[(cid:98)yt (cid:54)= yt] \u2212 1[\u03c6(xt) (cid:54)= yt].\n(cid:32)(cid:95)\n\n(cid:32)(cid:95)\n\n(cid:33)\n\n(cid:33)\n\n\u03c6(x) =\n\nx(i)\n\n\u2228\n\ni\u2208P\n\nx(i)\n\ni\u2208N\n\n3 Online Agnostic Learning of Disjunctions\n\nInstead of directly reducing PAC learning DNF expressions to no-regret learning for online sleep-\ning combinatorial optimization problems, we use an intermediate problem, online agnostic learning\nof disjunctions. By a standard online-to-batch conversion argument [Kanade and Steinke, 2014],\nonline agnostic learning of disjunctions is at least as hard as agnostic improper PAC-learning of dis-\njunctions [Kearns et al., 1994], which in turn is at least as hard as PAC-learning of DNF expressions\n[Kalai et al., 2012]. The online-to-batch conversion argument allows us to assume the stochastic\nadversary (i.i.d. input sequence) for online agnostic learning of disjunctions, which in turn implies\nthat our reduction applies to online sleeping combinatorial optimization with a stochastic adversary.\nOnline agnostic learning of disjunctions is a repeated game between the adversary and a learning\nalgorithm. Let n denote the number of variables in the disjunction. In each round t, the adversary\n\nchooses a vector xt \u2208 {0, 1}n, the algorithm predicts a label(cid:98)yt \u2208 {0, 1} and then the adversary\nreveals the correct label yt \u2208 {0, 1}. If(cid:98)yt (cid:54)= yt, we say that algorithm makes an error.\n\nFor any predictor \u03c6 : {0, 1}n \u2192 {0, 1}, we de\ufb01ne the regret with respect to \u03c6 after T rounds as\n\nwhere P and N are disjoint subsets of {1, 2, . . . , n}. We allow either P or N to be empty, and the\nempty disjunction is interpreted as the constant 0 function. For any index i \u2208 {1, 2, . . . , n}, we call\nit a relevant index for \u03c6 if i \u2208 P \u222a N and irrelevant index for \u03c6 otherwise. For any relevant index i,\nwe call it positive if i \u2208 P and negative if i \u2208 N.\n\n4 General Hardness Result\n\nIn this section, we identify two combinatorial properties of online sleeping combinatorial optimiza-\ntion problems that are computationally hard.\nDe\ufb01nition 1. Let n be a positive integer. Consider an instance of online sleeping combinatorial\noptimization where the ground set U has d elements with 3n + 2 \u2264 d \u2264 poly(n). This instance\nis called a hard instance with parameter n, if there exists a subset Us \u2286 U of size 3n + 2 and a\nbijection between Us and the set (i.e., labeling of elements in Us by the set)\n\nn(cid:91)\n\n{(i, 0), (i, 1), (i, (cid:63))} \u222a {0, 1},\n\nsuch that the decision set D satis\ufb01es the following properties:\n\ni=1\n\n1. (Heaviness) Any action V \u2208 D has at least n + 1 elements in Us.\n2. (Richness) For\n\n{(1, s1), (2, s2), . . . , (n, sn), sn+1} \u2208 Us is in D.\n\n(s1, . . . , sn+1)\n\nall\n\n\u2208\n\n{0, 1, (cid:63)}n \u00d7 {0, 1},\n\nthe\n\naction\n\nWe now show how to use the above de\ufb01nition of hard instances to prove the hardness of an online\nsleeping combinatorial optimization (OSCO) problem by reducing from the online agnostic learning\nof disjunction (OALD) problem. At a high level, the reduction works as follows. Given an instance\nof the OALD problem, we construct a speci\ufb01c instance of the the OSCO and a sequence of losses\nand availabilities based on the input to the OALD problem. This reduction has the property that\nfor any disjunction, there is a special set of actions of size n + 1 such that (a) exactly one action\nis available in any round and (b) the loss of this action exactly equals the loss of the disjunction on\nthe current input example. Furthermore, the action chosen by the OSCO can be converted into a\nprediction in the OALD problem with only lesser or equal loss. These two facts imply that the regret\nof the OALD algorithm is at most n + 1 times the per-action regret of the OSCO algorithm.\n\n4\n\n\fmization problem, and run Algosco on it.\n\ninput size n for the disjunction learning problem.\n\nAlgorithm 1 ALGORITHM ALGDISJ FOR LEARNING DISJUNCTIONS\nRequire: An algorithm Algosco for the online sleeping combinatorial optimization problem, and the\n1: Construct a hard instance (U,D) with parameter n of the online sleeping combinatorial opti-\n2: for t = 1, 2, . . . , T do\n3:\n4:\n5:\n6:\n7:\n8:\n\nReceive xt \u2208 {0, 1}n.\nSet the set of sleeping elements for Algosco to be St = {(i, 1 \u2212 xt(i)) | i = 1, 2, . . . , n}.\nSet(cid:98)yt = 1[0 /\u2208 Vt].\nObtain an action Vt \u2208 D by running Algosco such that Vt \u2229 St = \u2205.\nPredict(cid:98)yt, and receive true label yt.\nIn algorithm Algosco, set the loss of the awake elements e \u2208 U \\ St as follows:\n\n(cid:40) 1\u2212yt\n\nn+1\n\nyt \u2212 n(1\u2212yt)\n\nn+1\n\nif e (cid:54)= 0\nif e = 0.\n\n(cid:96)t(e) =\n\n9: end for\n\nTheorem 1. Consider an online sleeping combinatorial optimization problem such that for any\npositive integer n, there is a hard instance with parameter n of the problem. Suppose there is an\nalgorithm Algosco that for any instance of the problem with ground set U of size d, runs in time\npoly(T, d) and has regret bounded by poly(d) \u00b7 T 1\u2212\u03b4 for some \u03b4 \u2208 (0, 1). Then, there exists an\nalgorithm Algdisj for online agnostic learning of disjunctions over n variables with running time\npoly(T, n) and regret poly(n) \u00b7 T 1\u2212\u03b4.\n\nProof. Algdisj is given in Algorithm 1. First, we note that in each round t, we have\n\n(2)\nWe prove this separately for two different cases; in both cases, the inequality follows from the\nheaviness property, i.e., the fact that |Vt| \u2265 n + 1.\n\n(cid:96)t(Vt) \u2265 1[yt (cid:54)=(cid:98)yt].\n\n1. If 0 /\u2208 Vt, then the prediction of Algdisj is(cid:98)yt = 1, and thus\n2. If 0 \u2208 Vt, then the prediction of Algdisj is(cid:98)yt = 0, and thus\nyt \u2212 n(1 \u2212 yt)\n\n(cid:96)t(Vt) = |Vt| \u00b7 1 \u2212 yt\n(cid:18)\n\n(cid:96)t(Vt) = (|Vt| \u2212 1) \u00b7 1 \u2212 yt\n\nn + 1\n\n+\n\n(cid:19)\n\n\u2265 1 \u2212 yt = 1[yt (cid:54)=(cid:98)yt].\n\n\u2265 yt = 1[yt (cid:54)=(cid:98)yt].\n\nn + 1\n\nn + 1\n\nNote that if Vt satis\ufb01es the equality |Vt| = n + 1, then we have an equality (cid:96)t(Vt) = 1[yt (cid:54)=(cid:98)yt]; this\n\nproperty will be useful later.\nNext, let \u03c6 be an arbitrary disjunction, and let i1 < i2 < \u00b7\u00b7\u00b7 < im be its relevant indices sorted\nin increasing order. De\ufb01ne f\u03c6 : {1, 2, . . . , m} \u2192 {0, 1} as f\u03c6(j) := 1[ij is a positive index for \u03c6],\nand de\ufb01ne the set of elements W\u03c6 := {(i, (cid:63)) | i is an irrelevant index for \u03c6}. Finally, let D\u03c6 =\n{V 1\n\n} be the set of m + 1 actions where for j = 1, 2, . . . , m, we de\ufb01ne\n\n\u03c6 , . . . , V m+1\n\u03c6 , V 2\n\u03c6 := {(i(cid:96), 1 \u2212 f\u03c6((cid:96))) | 1 \u2264 (cid:96) < j} \u222a {(ij, f\u03c6(j))} \u222a {(i(cid:96), (cid:63)) | j < (cid:96) \u2264 m} \u222a W\u03c6 \u222a {1},\nV j\n\n\u03c6\n\nV m+1\n\u03c6\n\nand\n:= {(i(cid:96), 1 \u2212 f\u03c6((cid:96))) | 1 \u2264 (cid:96) \u2264 m} \u222a W\u03c6 \u222a {0}.\nThe actions in D\u03c6 are indeed in the decision set D due to the richness property.\nWe claim that D\u03c6 contains exactly one awake action in every round and the awake action contains\nthe element 1 if and only if \u03c6(xt) = 1. First, we prove uniqueness: if V j\n\u03c6 (where j < k)\nare both awake in the same round, then (ij, f\u03c6(j)) \u2208 V j\n\u03c6 are both awake\nelements, contradicting our choice of St. To prove the rest of the claim, we consider two cases:\n\n\u03c6 and V k\n\u03c6 and (ij, 1 \u2212 f\u03c6(j)) \u2208 V k\n\n5\n\n\f1. If \u03c6(xt) = 1, then there is at least one j \u2208 {1, 2, . . . , m} such that xt(ij) = f\u03c6(j). Let j(cid:48)\n(cid:48)\n\u03c6 is awake at time t, and 1 \u2208 V j\n\u03c6 ,\n\nbe the smallest such j. Then, by construction, the set V j\nas required.\n\n(cid:48)\n\n2. If \u03c6(xt) = 0, then for all j \u2208 {1, 2, . . . , m} we must have xt(ij) = 1 \u2212 f\u03c6(j). Then, by\n\nconstruction, the set V m+1\n\n\u03c6\n\nis awake at time t, and 0 \u2208 V m+1\n\n\u03c6\n\n, as required.\n\nSince every action in D\u03c6 has exactly n + 1 elements, and if V is awake action in D\u03c6 at time t, we\njust showed that 1 \u2208 V if and only if \u03c6(xt) = 1, exactly the same argument as in the beginning of\nthis proof implies that\n(3)\nFurthermore, since exactly one action in D\u03c6 is awake every round, we have\n\n(cid:96)t(V ) = 1[yt (cid:54)= \u03c6(xt)].\n\nT(cid:88)\n\n(cid:88)\n\n(cid:88)\n\n1[yt (cid:54)= \u03c6(xt)] =\n\nt=1\n\nV \u2208D\u03c6\n\nt: V \u2208At\n\n(cid:96)t(V ).\n\n(4)\n\nFinally, we can bound the regret of algorithm Algdisj (denoted Regretdisj\nalgorithm Algosco (denoted Regretosco\n\nT\n\nT ) in terms of the regret of\n\n) as follows:\n\n1[(cid:98)yt (cid:54)= yt] \u2212 1[\u03c6(xt) (cid:54)= yt] \u2264 (cid:88)\nT(cid:88)\n(cid:88)\n\nt=1\n\n(cid:88)\n\nRegretosco\n\nT\n\n(V ) \u2264 |D\u03c6| \u00b7 poly(d) \u00b7 T 1\u2212\u03b4 = poly(n) \u00b7 T 1\u2212\u03b4,\n\nV \u2208D\u03c6\n\nt: V \u2208At\n\n(cid:96)t(Vt) \u2212 (cid:96)t(V )\n\nRegretdisj\n\nT (\u03c6) =\n\n=\n\nThe \ufb01rst inequality follows by (2) and (4), and the last equation since |D\u03c6| \u2264 n + 1 and d \u2264\npoly(n).\n\nV \u2208D\u03c6\n\n4.1 Hardness results for Policy Regret and Ranking Regret\n\nIt is easy to see that our technique for proving hardness easily extends to ranking regret (and there-\nfore, policy regret). The reduction simply uses any algorithm for minimizing ranking regret in\nexactly one action Vt \u2208 D\u03c6 is awake in any round t, and (cid:96)t(Vt) = 1[yt (cid:54)=(cid:98)yt]. Thus, if we consider\nAlgorithm 1 as Algosco. This is because in the proof of Theorem 1, the set D\u03c6 has the property that\na ranking where the actions in D\u03c6 are ranked at the top positions (in arbitrary order), the loss of this\nranking exactly equals the number of errors made by the disjunction \u03c6 on the input sequence. The\nsame arguments as in the proof of Theorem 1 then imply that the regret of Algdisj is bounded by that\nof Algosco, implying the hardness result.\n\n5 Hard Instances for Speci\ufb01c Problems\n\nNow we apply Theorem 1 to prove that many online sleeping combinatorial optimization problems\nare as hard as PAC learning DNF expressions by constructing hard instances for them. Note that all\nthese problems admit ef\ufb01cient no-regret algorithms in the non-sleeping setting.\n\n5.1 Online Shortest Path Problem\n\nIn the ONLINE SHORTEST PATH problem, the learner is given a directed graph G = (V, E) and\ndesignated source and sink vertices s and t. The ground set is the set of edges, i.e. U = E,\nand the decision set D is the set of all paths from s to t. The sleeping version of this problem\nhas been called the ONLINE SABOTAGED SHORTEST PATH problem by Koolen et al. [2015], who\nposed the open question of whether it admits an ef\ufb01cient no-regret algorithm. For any n \u2208 N, a\nhard instance is the graph G(n) shown in Figure 2. It has 3n + 2 edges that are labeled by the\ni=1{(i, 0), (i, 1), (i, (cid:63))} \u222a {0, 1}, as required. Now note that any\ns-t path in this graph has length exactly n + 1, so D satis\ufb01es the heaviness property. Furthermore,\nthe richness property is clearly satis\ufb01ed, since for any s \u2208 {0, 1, (cid:63)}n \u00d7 {0, 1}, the set of edges\n{(1, s1), (2, s2), . . . , (n, sn), sn+1} is an s-t path and therefore in D.\n\nelements of ground set U = (cid:83)n\n\n6\n\n\fFigure 2: Graph G(n).\n\nFigure 3: Graph P (n). This is a complete bipartite graph as described in the text, but only the\nspecial labeled edges shown for clarity.\n\n5.2 Online Minimum Spanning Tree Problem\n\nIn the ONLINE MINIMUM SPANNING TREE problem, the learner is given a \ufb01xed graph G = (V, E).\nThe ground set here is the set of edges, i.e. U = E, and the decision set D is the set of spanning\ntrees in the graph. For any n \u2208 N, a hard instance is the same graph G(n) shown in Figure 2, except\nthat the edges are undirected. Note that the spanning trees in G(n) are exactly the paths from s to\nt. The hardness of this problem immediately follows from the hardness of the ONLINE SHORTEST\nPATHS problem.\n\n5.3 Online k-Subsets Problem\n\nIn the ONLINE k-SUBSETS problem, the learner is given a \ufb01xed ground set of elements U. The\ndecision set D is the set of subsets of U of size k. For any n \u2208 N, we construct a hard instance with\nparameter n of the ONLINE k-SUBSETS problem with k = n + 1 and d = 3n + 2. The set D of all\nsubsets of size k = n + 1 of a ground set U of size d = 3n + 2 clearly satis\ufb01es both the heaviness\nand richness properties.\n\n5.4 Online k-Truncated Permutations Problem\n\nIn the ONLINE k-TRUNCATED PERMUTATIONS problem (also called the ONLINE k-RANKING\nproblem), the learner is given a complete bipartite graph with k nodes on one side and m \u2265 k nodes\non the other, and the ground set U is the set of all edges; thus d = km. The decision set D is the\nset of all maximal matchings, which can be interpreted as truncated permutations of k out of m ob-\njects. For any n \u2208 N, we construct a hard instance with parameter n of the ONLINE k-TRUNCATED\nPERMUTATIONS problem with k = n + 1, m = 3n + 2 and d = km = (n + 1)(3n + 2). Let\nL = {u1, u2, . . . , un+1} be the nodes on the left side of the bipartite graph, and since m = 3n + 2,\nlet R = {vi,0, vi,1, vi,(cid:63) | i = 1, 2, . . . , n} \u222a {v0, v1} denote the nodes on the right side of the\ngraph. The ground set U consists of all d = km = (n + 1)(3n + 2) edges joining nodes in L to\nnodes in R. We now specify the special 3n + 2 elements of the ground set U: for i = 1, 2, . . . , n,\nlabel the edges (ui, vi,0), (ui, vi,1), (ui, vi,(cid:63)) by (i, 0), (i, 1), (i, (cid:63)) respectively. Finally, label the\nedges (un+1, v0), (un+1, v1) by 0 and 1 respectively. The resulting bipartite graph P (n) is shown in\nFigure 3, where only the special labeled edges are shown for clarity.\nNow note that any maximal matching in this graph has exactly n+1 edges, so the heaviness condition\nis satis\ufb01ed. Furthermore, the richness property is satis\ufb01ed, since for any s \u2208 {0, 1, (cid:63)}n \u00d7{0, 1}, the\nset of edges {(1, s1), (2, s2), . . . , (n, sn), sn+1} is a maximal matching and therefore in D.\n\n7\n\n(2,1)(2,0)v2sv1vn\u22121vnt(1,0)(n,0)(1,1)(n,1)(1,?)(2,?)(n,?)10u1v1,1(1,0)(1,1)(1,?)v1,?v1,0u2v2,1(2,0)(2,1)(2,?)v2,?v2,0unvn,1(n,0)(n,1)(n,?)vn,?vn,010un+110\fFigure 4: Graph M (n) for the ONLINE BIPARTITE MATCHING problem.\n\nFigure 5: Graph C (n) for the ONLINE MINIMUM CUT problem.\n\n5.5 Online Bipartite Matching Problem\n\nset U = (cid:83)n\n\nIn the ONLINE BIPARTITE MATCHING problem, the learner is given a \ufb01xed bipartite graph\nG = (V, E). The ground set here is the set of edges, i.e. U = E, and the decision set D is\nthe set of maximal matchings in G. For any n \u2208 N, a hard instance with parameter n is the\ngraph M (n) shown in Figure 4. It has 3n + 2 edges that are labeled by the elements of ground\ni=1{(i, 0), (i, 1), (i, (cid:63))} \u222a {0, 1}, as required. Now note that any maximal match-\ning in this graph has size exactly n + 1, so D satis\ufb01es the heaviness property. Furthermore,\nthe richness property is clearly satis\ufb01ed, since for any s \u2208 {0, 1, (cid:63)}n \u00d7 {0, 1}, the set of edges\n{(1, s1), (2, s2), . . . , (n, sn), sn+1} is a maximal matching and therefore in D.\n\n5.6 Online Minimum Cut Problem\n\nIt has 3n + 2 edges that are labeled by the elements of ground set U =(cid:83)n\n\nIn the ONLINE MINIMUM CUT problem the learner is given a \ufb01xed graph G = (V, E) with a\ndesignated pair of vertices s and t. The ground set here is the set of edges, i.e. U = E, and the\ndecision set D is the set of cuts separating s and t: a cut here is a set of edges that when removed from\nthe graph disconnects s from t. For any n \u2208 N, a hard instance is the graph C (n) shown in Figure 5.\ni=1{(i, 0), (i, 1), (i, (cid:63))} \u222a\n{0, 1}, as required. Now note that any cut in this graph has size at least n + 1, so D satis\ufb01es\nthe heaviness property. Furthermore, the richness property is clearly satis\ufb01ed, since for any s \u2208\n{0, 1, (cid:63)}n \u00d7 {0, 1}, the set of edges {(1, s1), (2, s2), . . . , (n, sn), sn+1} is a cut and therefore in D.\n\n6 Conclusion\n\nIn this paper we showed that obtaining an ef\ufb01cient no-regret algorithm for sleeping versions of sev-\neral natural online combinatorial optimization problems is as hard as ef\ufb01ciently PAC learning DNF\nexpressions, a long-standing open problem. Our reduction technique requires only very modest con-\nditions for hard instances of the problem of interest, and in fact is considerably more \ufb02exible than\nthe speci\ufb01c form presented in this paper. We believe that almost any natural combinatorial optimiza-\ntion problem that includes instances with exponentially many solutions will be a hard problem in\nits online sleeping variant. Furthermore, our hardness result is via stochastic i.i.d. availabilities and\nlosses, a rather benign form of adversary. This suggests that obtaining sublinear per-action regret\nis perhaps a rather hard objective, and suggests that to obtain ef\ufb01cient algorithms we might need to\neither (a) make suitable simpli\ufb01cations of the regret criterion or (b) restrict the adversary\u2019s power.\n\n8\n\nu1v1un+1vn+1(1,0)(1,1)(1,?)10u2v2(2,0)(2,1)(2,?)unvn(n,0)(n,1)(n,?)u1v1(1,0)(1,1)(1,?)10u2v2(2,0)(2,1)(2,?)unvn(n,0)(n,1)(n,?)stw\fReferences\nYasin Abbasi-Yadkori, Peter L. Bartlett, Varun Kanade, Yevgeny Seldin, and Csaba Szepesv\u00b4ari.\nOnline learning in markov decision processes with adversarially chosen transition probability\ndistributions. In Advances in Neural Information Processing Systems (NIPS), pages 2508\u20132516,\n2013.\n\nJean-Yves Audibert, Bubeck S\u00b4ebastien, and G\u00b4abor Lugosi. Regret in online combinatorial optimiza-\n\ntion. Mathematics of Operations Research, 39(1):31\u201345, 2013.\n\nNicol`o Cesa-Bianchi and G\u00b4abor Lugosi. Prediction, Learning and Games. Cambridge University\n\nPress, New York, NY, 2006.\n\nYoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an\n\napplication to boosting. Journal of Computer and System Sciences, 55(1):119\u2013139, 1997.\n\nYoav Freund, Robert E. Schapire, Yoram Singer, and Warmuth K. Manfred. Using and combining\nIn Proceedings of the 29th Annual ACM symposium on Theory of\n\npredictors that specialize.\nComputing, pages 334\u2013343. ACM, 1997.\n\nAdam Kalai and Santosh Vempala. Ef\ufb01cient algorithms for online decision problems. Journal of\n\nComputer and System Sciences, 71(3):291\u2013307, 2005.\n\nAdam Tauman Kalai, Varun Kanade, and Yishay Mansour. Reliable agnostic learning. Journal of\n\nComputer and System Sciences, 78(5):1481\u20131495, 2012.\n\nVarun Kanade and Thomas Steinke. Learning hurdles for sleeping experts. ACM Transactions on\n\nComputation Theory (TOCT), 6(3):11, 2014.\n\nVarun Kanade, H. Brendan McMahan, and Brent Bryan. Sleeping experts and bandits with stochastic\naction availability and adversarial rewards. In Proceedings of the 12th International Conference\non Arti\ufb01cial Intelligence and Statistics (AISTATS), pages 272\u2013279, 2009.\n\nMichael J. Kearns, Robert E. Schapire, and Linda M. Sellie. Toward ef\ufb01cient agnostic learning.\n\nMachine Learning, 17(2\u20133):115\u2013141, 1994.\n\nRobert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. Regret bounds for sleeping\n\nexperts and bandits. Machine learning, 80(2-3):245\u2013272, 2010.\n\nAdam R. Klivans and Rocco Servedio. Learning DNF in Time 2 \u02dcO(n1/3). In Proceedings of the 33rd\n\nAnnual ACM Symposium on Theory of Computing (STOC), pages 258\u2013265. ACM, 2001.\n\nWouter M. Koolen, Manfred K. Warmuth, and Jyrki Kivinen. Hedging structured concepts.\n\nIn\nAdam Tauman Kalai and Mehryar Mohri, editors, Proceedings of the 23th Conference on Learn-\ning Theory (COLT), pages 93\u2013105, 2010.\n\nWouter M. Koolen, Manfred K. Warmuth, and Dmitry Adamskiy. Open problem: Online sabotaged\n\nshortest path. In Proceedings of the 28th Conference on Learning Theory (COLT), 2015.\n\nNick Littlestone and Manfred K. Warmuth. The weighted majority algorithm.\n\ncomputation, 108(2):212\u2013261, 1994.\n\nInformation and\n\nGergely Neu and Michal Valko. Online combinatorial optimization with stochastic decision sets\nand adversarial losses. In Advances in Neural Information Processing Systems, pages 2780\u20132788,\n2014.\n\nEiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. The Journal of\n\nMachine Learning Research, 4:773\u2013818, 2003.\n\n9\n\n\f", "award": [], "sourceid": 1139, "authors": [{"given_name": "Satyen", "family_name": "Kale", "institution": "Google"}, {"given_name": "Chansoo", "family_name": "Lee", "institution": "University of Michigan"}, {"given_name": "David", "family_name": "Pal", "institution": "Google"}]}