{"title": "Smooth Interactive Submodular Set Cover", "book": "Advances in Neural Information Processing Systems", "page_first": 118, "page_last": 126, "abstract": "Interactive submodular set cover is an interactive variant of submodular set cover over a hypothesis class of submodular functions, where the goal is to satisfy all sufficiently plausible submodular functions to a target threshold using as few (cost-weighted) actions as possible. It models settings where there is uncertainty regarding which submodular function to optimize. In this paper, we propose a new extension, which we call smooth interactive submodular set cover, that allows the target threshold to vary depending on the plausibility of each hypothesis. We present the first algorithm for this more general setting with theoretical guarantees on optimality. We further show how to extend our approach to deal with real-valued functions, which yields new theoretical results for real-valued submodular set cover for both the interactive and non-interactive settings.", "full_text": "Smooth Interactive Submodular Set Cover\n\nBryan He\n\nStanford University\n\nbryanhe@stanford.edu\n\nYisong Yue\n\nCalifornia Institute of Technology\n\nyyue@caltech.edu\n\nAbstract\n\nInteractive submodular set cover is an interactive variant of submodular set cover\nover a hypothesis class of submodular functions, where the goal is to satisfy\nall suf\ufb01ciently plausible submodular functions to a target threshold using as few\n(cost-weighted) actions as possible. It models settings where there is uncertainty\nregarding which submodular function to optimize. In this paper, we propose a new\nextension, which we call smooth interactive submodular set cover, that allows the\ntarget threshold to vary depending on the plausibility of each hypothesis. We\npresent the \ufb01rst algorithm for this more general setting with theoretical guarantees\non optimality. We further show how to extend our approach to deal with real-\nvalued functions, which yields new theoretical results for real-valued submodular\nset cover for both the interactive and non-interactive settings.\n\n1\n\nIntroduction\n\nIn interactive submodular set cover (ISSC) [10, 11, 9], the goal is to interactively satisfy all plausible\nsubmodular functions in as few actions as possible. ISSC is a wide-encompassing framework that\ngeneralizes both submodular set cover [24] by virtue of being interactive, as well as some instances\nof active learning by virtue of many active learning criteria being submodular [12, 9].\nA key characteristic of ISSC is the a priori uncertainty regarding the correct submodular function to\noptimize. For example, in personalized recommender systems, the system does not know the user\u2019s\npreferences a priori, but can learn them interactively via user feedback. Thus, any algorithm must\nchoose actions in order to disambiguate between competing hypotheses as well as optimize for the\nmost plausible ones \u2013 this issue is also known as the exploration-exploitation tradeoff.\nIn this paper, we propose the smooth interactive submodular set cover problem, which addresses\ntwo important limitations of previous work. The \ufb01rst limitation is that conventional ISSC [10, 11, 9]\nonly allows for a single threshold to satisfy, and this \u201call or nothing\u201d nature can be in\ufb02exible for\nsettings where the covering goal should vary smoothly (e.g., based on plausibility). In smooth ISSC,\none can smoothly vary the target threshold of the candidate submodular functions according to their\nplausibility. In other words, the less plausible a hypothesis is, the less we emphasize maximizing\nits associated utility function. We present a simple greedy algorithm for smooth ISSC with prov-\nable guarantees on optimality. We also show that our smooth ISSC framework and algorithm fully\ngeneralize previous instances of and algorithms for ISSC by reducing back to just one threshold.\nOne consequence of smooth ISSC is the need to optimize for real-valued functions, which leads\nto the second limitation of previous work. Many natural classes of submodular functions are real-\nvalued (cf. [25, 5, 17, 21]). However, submodular set cover (both interactive and non-interactive)\nhas only been rigorously studied for integral or rational functions with \ufb01xed denominator, which\nhighlights a signi\ufb01cant gap between theory and practice. We propose a relaxed version of smooth\nISSC using an approximation tolerance \u0001, such that one needs only to satisfy the set cover criterion to\nwithin \u0001. We extend our greedy algorithm to provably optimize for real-valued submodular functions\nwithin this \u0001 tolerance. To the best of our knowledge, this yields the \ufb01rst theoretically rigorous\nalgorithm for real-valued submodular set cover (both interactive and non-interactive).\n\n1\n\n\fProblem 1 Smooth Interactive Submodular Set Cover\n1: Given:\n\n3: Goal: Using minimal cost(cid:80)\n\n{(\u02c6qi, \u02c6ri)}i and S\u2217 (cid:52)\n\n=(cid:83)\nq\u2208Q,r\u2208q(h\u2217){(q, r)}.\n\n1. Hypothesis class H (does not necessarily contain h\u2217)\n2. Query set Q and response set R with known q(h) \u2286 R for q \u2208 Q, h \u2208 H\n3. Modular query cost function c de\ufb01ned over Q\n4. Monotone submodular objective functions Fh : 2Q\u00d7R \u2192 R\u22650 for h \u2208 H\n5. Monotone submodular distance functions Gh : 2Q\u00d7R \u2192 R\u22650 for h \u2208 H, with Gh(S\u2295(q, r))\u2212\n6. Threshold function \u03b1 : R\u22650 \u2192 R\u22650 mapping a distance to required objective function value\n\nGh(S) = 0 for any S if r \u2208 q(h)\n\n2: Protocol: For i = 1, . . . ,\u221e: ask a question \u02c6qi \u2208 Q and receive a response \u02c6ri \u2208 \u02c6qi(h\u2217).\n\ni c(\u02c6qi), terminate when Fh( \u02c6S) \u2265 \u03b1(Gh(S\u2217)) for all h \u2208 H, where \u02c6S =\n\n2 Background\n\nF (A \u2295 q) \u2265 F (A)\n\nSubmodular Set Cover. In the basic submodular set cover problem [24], we are given an action\nset Q and a monotone submodular set function F : 2Q \u2192 R\n\u22650 that maps subsets A \u2286 Q to\nnon-negative scalar values. A set function F is monotone and submodular if and only if:\nand F (A \u2295 q) \u2212 F (A) \u2265 F (B \u2295 q) \u2212 F (B),\n\u2200A \u2286 B \u2286 Q, q \u2208 Q :\nrespectively, where \u2295 denotes set addition (i.e., A \u2295 q \u2261 A \u222a {q}). In other words, monotonicity\nimplies that adding a set always yields non-negative gain, and submodularity implies that adding to\na smaller set A results in a larger gain than adding to a larger set B. We also assume that F (\u2205) = 0.\nEach q \u2208 Q is associated with a modular or additive cost c(q). Given a target threshold \u03b1, the goal is\nto select a set A that satis\ufb01es F (A) \u2265 \u03b1 with minimal cost c(A) =(cid:80)q\u2208A c(q). This problem is NP-\nhard; but for integer-valued F , simple greedy forward selection can provably achieve near-optimal\ncost of at most (1 + ln(maxa\u2208Q F ({a}))OP T [24], and is typically very effective in practice.\nOne motivating application is content recommendation [5, 4, 25, 11, 21], where Q are items to\nrecommend, F (A) captures the utility of A \u2286 Q, and \u03b1 is the satisfaction goal. Monotonicity\nof F captures the property that total utility never decreases as one recommends more items, and\nsubmodularity captures the the diminishing returns property when recommending redundant items.\nInteractive Submodular Set Cover. In the basic interactive setting [10], the decision maker must\noptimize over a hypothesis class H of submodular functions Fh. The setting is interactive, whereby\nthe decision maker chooses an action (or query) q \u2208 Q, and the environment provides a response r \u2208\nR. Each query q is now a function mapping hypotheses H to responses R (i.e., q(h) \u2208 R), and the\nenvironment provides responses according to an unknown true hypothesis h\u2217 \u2208 H (i.e., r \u2261 q(h\u2217)).\nThis process iterates until Fh\u2217 (S) \u2265 \u03b1, where S denotes the set of observed question/response pairs:\nS = {(q, r)} \u2286 Q\u00d7R. The goal is to satisfy Fh\u2217 (S) \u2265 \u03b1 with minimal cost c(S) =(cid:80)(q,r)\u2208S c(q).\nFor example, when recommending movies to a new user with unknown interests (cf. [10, 11]), H\ncan be a set of user types or movie genres (e.g., H = {Action, Drama, Horror, . . .}). Then Q would\ncontain individual movies that can be recommended, and R would be a \u201cyes\u201d or \u201cno\u201d response or\nan integer rating representing how interested the user (modeled as h\u2217) is in a given movie.\nThe interactive setting is both a learning and covering problem, as opposed to just a covering prob-\nlem. The decision maker must balance between disambiguating between hypotheses in H (i.e.,\nidentifying which is the true h\u2217) and satisfying the covering goal Fh\u2217 (S) \u2265 \u03b1; this issue is also\nknown as the exploration-exploitation tradeoff. Noisy ISSC [11] extends basic ISSC by no longer\nassuming the true h\u2217 is in H, and uses a distance function Gh and tolerance \u03ba such that the goal is\nto satisfy Fh(S) \u2265 \u03b1 for all suf\ufb01ciently plausible h, where plausibility is de\ufb01ned as Gh(S) \u2264 \u03ba.\n3 Problem Statement\n\nWe now present the smooth interactive submodular set cover problem, which generalizes basic\nand noisy ISSC [10, 11] (described in Section 2). Like basic ISSC, each hypothesis h \u2208 H is\nassociated with a utility function Fh : 2Q\u00d7R \u2192 R\n\u22650 that maps sets of query/response pairs to\n\n2\n\n\f\u03b11\n\n\u03b12\n\nFh\n\n\u03b13\n\n\u03b11\n\n\u03b12\n\nFh\n\n\u03b13\n\n\u0001\n\nFh\n\nFh\n\n\u03ba1\u03ba2\n\nGh\n(a)\n\n\u03ba3\n\n\u03ba1\u03ba2\n\n\u03ba3\n\nGh\n(b)\n\nGh\n(c)\n\n\u0001\n\nGh\n(d)\n\nFigure 1: Examples of (a) multiple thresholds, (b) approximate multiple thresholds, (c) a continuous\nconvex threshold, and (d) an approximate continuous convex threshold. For the approximate setting,\nwe essentially allow for satisfying any threshold function that resides in the yellow region.\n\nnon-negative scalars. Like noisy ISSC, the hypothesis class H does not necessarily contain the true\nh\u2217 (i.e., the agnostic setting). Each h \u2208 H is associated with a distance or disagreement function\nGh : 2Q\u00d7R \u2192 R\n\u22650 which maps sets of question/response pairs to a disagreement score (i.e., the\nlarger Gh(S) is, the more h disagrees with S). We further require that Fh(\u2205) = 0 and Gh(\u2205) = 0.\nProblem 1 describes the general problem setting. Let S\u2217 (cid:52)=(cid:83)q\u2208Q,r\u2208q(h\u2217){(q, r)} denote the set of\nall possible question/responses pairs given by h\u2217. The goal is to construct a question/response set\n\u02c6S with minimal cost such that, for every h \u2208 H we have Fh( \u02c6S) \u2265 \u03b1(Gh(S\u2217)), where \u03b1(\u00b7) maps\ndisagreement values to desired utilities. In general, \u03b1(\u00b7) is a non-increasing function, since the goal\nis to optimize more the most plausible hypotheses in H. We describe two versions of \u03b1(\u00b7) below.\nVersion 1: Step Function (Multiple Thresholds). The \ufb01rst version uses a decreasing step function\n(see Figure 1(a)). Given a pair of sequences \u03b11 > . . . > \u03b1N > 0 and 0 < \u03ba1 < . . . < \u03baN ,\nthe threshold function is \u03b1(v) = \u03b1n\u03ba(v) where n\u03ba(v) = min{n \u2208 {0, . . . , N + 1}|v < \u03ban}, and\n\u03b10 (cid:52)= \u221e, \u03b1N +1 (cid:52)= 0, \u03ba0 (cid:52)= 0, \u03baN +1 (cid:52)= \u221e. The goal in Problem 1 is equivalently: \u201c \u2200h \u2208 H and\nn = 1, . . . , N: satisfy Fh( \u02c6S) \u2265 \u03b1n whenever Gh(S\u2217) < \u03ban.\u201d This version is a strict generalization\nof noisy ISSC, which uses only a single \u03b1 and \u03ba.\nVersion 2: Convex Threshold Curve. The second version uses a convex \u03b1(\u00b7) that decreases con-\ntinuously as Gh(S\u2217) increases (see Figure 1(c)), and is not a strict generalization of noisy ISSC.\nApproximate Thresholds. Finally, we also consider a relaxed version of smooth ISSC, whereby\nwe only require that the objectives Fh be satis\ufb01ed to within some tolerance \u0001 \u2265 0. More formally,\nwe say that we approximately solve Problem 1 with tolerance \u0001 if its goal is rede\ufb01ned as: \u201cusing\nminimal cost,(cid:80)i c(\u02c6qi), guarantee Fh( \u02c6S) \u2265 \u03b1(Gh(S\u2217))\u2212 \u0001 for all h \u2208 H.\u201d See Figure 1(b) & 1(d)\n\nfor the approximate versions of the multiple tresholds and convex versions, respectively.\nISSC has only been rigorously studied when the utility functions are Fh are rational-valued with\na \ufb01xed denominator. We show in Section 4.3 how to ef\ufb01ciently solve the approximate version of\nsmooth ISSC when Fh are real-valued, which also yields a new approach for approximately solving\nthe classical non-interactive submodular set cover problem with real-valued objective functions.\n\n4 Algorithm & Main Results\n\nA key question in the study of interactive optimization is how to balance the exploration-exploitation\ntradeoff. On the one hand, one should exploit current knowledge to ef\ufb01ciently satisfy the plausible\nsubmodular functions. However, hypotheses that seem plausible might actually not be due to imper-\nfections in the algorithm\u2019s knowledge. One should thus explore by playing actions that disambiguate\nthe plausibility of competing hypotheses. Our setting is further complicated due to also solving a\ncombinatorial optimization problem (submodular set cover), which is in general intractable.\n\n4.1 Approach Outline\n\nWe present a general greedy algorithm, described in Algorithm 1 below, for solving smooth ISSC\nwith provably near-optimal cost. Algorithm 1 requires as input a submodular meta-objective \u00afF\n\n3\n\n\fAlgorithm 1 Worst Case Greedy Algorithm for Smooth Interactive Submodular Set Cover\n1: input: \u00afF\n2: input: \u00afFmax\n3: input: Q\n4: input: R\n5: S \u2190 \u2205\n6: while \u00afF (S) < \u00afFmax do\n7:\n8:\n9:\n10: end while\n\n\u02c6q \u2190 argmaxq\u2208Q minr\u2208R(cid:0) \u00afF (S \u2295 (q, r)) \u2212 \u00afF (S)(cid:1) /c(q)\n\nPlay \u02c6q, observe \u02c6r\nS \u2190 S \u2295 (\u02c6q, \u02c6r)\n\n// Submodular Meta-Objective\n// Termination Threshold for \u00afF\n// Query or Action Set\n// Response Set\n\nVariable De\ufb01nition\n\nH\nQ\nR\nFh\nGh\n\u00afF\n\n\u00afFmax\nDF\nDG\n\u03b1(\u00b7)\n\u03b1i\n\u03bai\nN\n\u0001\nF (cid:48)\n\u03b1(cid:48)\n\nh\n\nn\n\nSet of hypotheses\nSet of actions or queries\nSet of responses\nMonotone non-decreasing submodular utility function\nMonotone non-decreasing submodular distance function\nMonotone non-decreasing submodular function unifying Fh, Gh and the thresholds\nMaximum value held by \u00afF\nDenominator for Fh (when rational)\nDenominator for Gh (when rational)\nContinuous convex threshold\nThresholds for F (\u03b11 is largest)\nThresholds for G (\u03ba1 is smallest)\nNumber of thresholds\nApproximation tolerance for the real-valued case\nSurrogate utility function for the approximate version\nSurrogate thresholds for the approximate version\n\nFigure 2: Summary of notation used. The top portion is used in all settings. The middle portion is\nused for the multiple thresholds setting. The bottom portion is used for real-valued functions.\n\nthat quanti\ufb01es the exploration-exploitation trade-off, and the speci\ufb01c instantiation of \u00afF depends on\nwhich version of smooth ISSC is being solved. Algorithm 1 greedily optimizes for the worst case\noutcome at each iteration (Line 7) until a termination condition \u00afF \u2265 \u00afFmax has been met (Line 6).\nThe construction of \u00afF is essentially a reduction of smooth ISSC to a simpler submodular set cover\nproblem, and generalizes the reduction approach in [11]. In particular, we \ufb01rst lift the analysis of\n[11] to deal with multiple thresholds (Section 4.2). We then show how to deal with approximate\nthresholds in the real-valued setting (Section 4.3), which \ufb01nally allows us to address the continuous\nthreshold setting (Section 4.4). Our cost guarantees are stated relative to the general cover cost\n(GCC), which lower bounds the optimal cost, as stated in De\ufb01nition 4.1 and Lemma 4.2 below. Via\nthis reduction, we can show that our approach achieves cost bounded by (1 + ln \u00afFmax)GCC \u2264\n(1 + ln \u00afFmax)OP T . For clarity of exposition, all proofs are deferred to the supplementary material.\nDe\ufb01nition 4.1 (General Cover Cost (GCC)). De\ufb01ne oracles T \u2208 RQ to be functions mapping\nquestions to responses and T ( \u02c6Q) \u2206=(cid:83)\u02c6qi\u2208 \u02c6Q{(\u02c6qi, T (\u02c6qi))}. T ( \u02c6Q) is the set of question-response pairs\n\ngiven by T for the set of questions \u02c6Q. De\ufb01ne the General Cover Cost as:\n\n(cid:18)\n\n(cid:19)\n\nGCC \u2206= max\nT\u2208RQ\n\nmin\n\n\u02c6Q: \u00afF (T ( \u02c6Q))\u2265 \u00afFmax\n\nc( \u02c6Q)\n\n.\n\nLemma 4.2 (Lemma 3 from [11]). If there is a question asking strategy for satisfying \u00afF ( \u02c6S) \u2265 \u00afFmax\nwith worst case cost C\u2217, then GCC \u2264 C\u2217. Thus GCC \u2264 OP T .\n4.2 Multiple Thresholds Version\n\nWe begin with the multiple thresholds version. In this section, we assume that each Fh and Gh\nare rational-valued with \ufb01xed denominators DF and DG, respectively.1 We \ufb01rst de\ufb01ne a doubly\n\n1When each Fh and/or Gh are integer-valued, then DF = 1 and/or DG = 1, respectively.\n\n4\n\n\fFigure 3: Depicting the relationship between the terms de\ufb01ned in De\ufb01nition 4.3. (A) If \u00afFhi,n \u2265\n\u00afFhi,nmax = (\u03b1n\u2212\u03b1n+1)(\u03ban\u2212\u03ban\u22121), then either Fhi \u2265 \u03b1n or Ghi \u2265 \u03ban; this generates the tradeoff\nbetween satisfying the either of the two thresholds. (B) If \u00afFhi \u2265 \u00afFhmax, then \u00afFhi,n \u2265 \u00afFhi,nmax\n\u2200i \u2208 {1, . . . , N}; this enforces that all i, at least one of the thresholds \u03b1i or \u03bai must be satis\ufb01ed. (C)\nIf \u00afF \u2265 \u00afFmax, then \u00afFh \u2265 \u00afFhmax \u2200h \u2208 H; this enforces that all hypotheses must be satis\ufb01ed.\ntruncated version of each hypothesis submodular utility and distance function:\n\n(1)\n\nFh,\u03b1n,\u03b1j ( \u02c6S)\nGh,\u03ban,\u03baj ( \u02c6S)\n\n(cid:52)\n= max(min(Fh( \u02c6S), \u03b1n), \u03b1j) \u2212 \u03b1j,\n(cid:52)\n= max(min(Gh( \u02c6S), \u03ban), \u03baj) \u2212 \u03baj.\n\n(2)\nIn other words, Fh,\u03b1n,\u03b1j is truncated from below at \u03b1j and from above at \u03b1n (it is assumed that\n\u03b1n > \u03b1j), and is offset by \u2212\u03b1j so that Fh,\u03b1n,\u03b1j (\u2205) = 0. Gh,\u03ban,\u03baj is constructed analogously.\nUsing (1) and (2), we can de\ufb01ne the general forms of \u00afF and \u00afFmax, which can be instantiated to\naddress different versions of smooth ISSC.\nDe\ufb01nition 4.3 (General form of \u00afF and \u00afFmax).\n(\u03ban \u2212 \u03ban\u22121) \u2212 Gh,\u03ban,\u03ban\u22121 ( \u02c6S)\nN(cid:88)\n(cid:88)\n\nFh,\u03b1n,\u03b1n+1 ( \u02c6S) + Gh,\u03ban,\u03ban\u22121 ( \u02c6S)(\u03b1n \u2212 \u03b1n+1),\n\n(cid:17)\n\uf8f6\uf8f8 \u00afFh,n( \u02c6S)\n\n\uf8ee\uf8f0\uf8eb\uf8ed(cid:89)\n\n(\u03baj \u2212 \u03baj\u22121)\n\n\uf8f9\uf8fb ,\n\n\u00afFh( \u02c6S)\n\n(cid:52)\n= C \u00afF\n\n\u00afFh,n( \u02c6S)\n\n(cid:52)\n=\n\nn=1\n\nj(cid:54)=n\n\n(cid:16)\n\n\u00afF ( \u02c6S)\n\n(cid:52)\n=\n\n\u00afFh( \u02c6S), \u00afFmax\n\n(cid:52)\n= |H|CF CG.\n\nh\u2208H\n\nN(cid:89)\n\nThe coef\ufb01cient C \u00afF converts each \u00afFh to be integer-valued, CF is the contribution to \u00afFmax from Fh\nand \u03b1n, and CG is the contribution to \u00afFmax from Gh and \u03ban.\nDe\ufb01nition 4.4 (Multiple Thresholds Version of ISSC). Given \u03b11, . . . , \u03b1N and \u03ba1, . . . , \u03baN , we in-\nstantiate \u00afF and \u00afFmax in De\ufb01nition 4.3 via:\n\nC \u00afF = DF DN\nG ,\n\nCF = DF \u03b11,\n\nCG = DN\nG\n\n(\u03ban \u2212 \u03ban\u22121).\n\nn=1\n\n\u00afF in De\ufb01nition 4.4 trades off between exploitation (maximizing the plausible Fh\u2019s) and exploration\n(disambiguating plausibility in Fh\u2019s) by allowing each \u00afFh to reach its maximum by either Fh reach-\ning \u03b1i or Gh reaching \u03bai. In other words, each \u00afFh can be satis\ufb01ed with either a suf\ufb01ciently large\nutility Fh or large distance Gh. Figure 3 shows the logical relationships between these components.\nWe prove in Appendix A that \u00afF is monotone submodular, and that \ufb01nding an S such that \u00afF (S) \u2265\n\u00afFmax is equivalent to solving Problem 1. For \u00afF to be submodular, we also require Condition 4.5,\nwhich is essentially a discrete analogue to the condition that a continuous \u03b1(\u00b7) should be convex.\nCondition 4.5. The sequence (cid:104) \u03b1n\u2212\u03b1n+1\nTheorem 4.6. Given Condition 4.5, Algorithm 1 using De\ufb01nition 4.4 solves the multiple thresholds\nn=1(\u03ban \u2212 \u03ban\u22121)(cid:17)(cid:17) GCC.\nversion of Problem 1 using cost at most(cid:16)1 + ln(cid:16)|H|DF DN\nIf each Gh is integral and \u03ban = \u03ban\u22121 + 1, then the bound simpli\ufb01es to (1 + ln (|H|DF \u03b11)) GCC.\nWe present an alternative formulation in Appendix D.2 that has better bounds when DG is large, but\nis less \ufb02exible and cannot be easily extended to the real-valued and convex threshold curve settings.\n\nG \u03b11(cid:81)N\n\nn=1 is non-increasing.\n\n\u03ban\u2212\u03ban\u22121 (cid:105)N\n\n5\n\n216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269\u00afF\u00afFmaxC^\u00afFh1\u00afFhmax\u00b7\u00b7\u00b7\u00afFhi\u00afFhmax\u00b7\u00b7\u00b7\u00afFh|H|\u00afFhmaxB\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7^\u00afFhi,1\u00afFh,1max\u00b7\u00b7\u00b7\u00afFhi,N\u00afFh,NmaxA_\u00b7\u00b7\u00b7_Fhi\u21b51Ghi\uf8ff1\u00b7\u00b7\u00b7Fhi\u21b5NGhi\uf8ffNFigure2:This\ufb01gureshowstherelationshipbetweenthetermsde\ufb01nedinDe\ufb01nition4.3.(A)For\u00afFhi,n\u00afFhi,nmax=(\u21b5n\u21b5n+1)(\uf8ffn\uf8ffn1),eitherFhi\u21b5norGhi\uf8ffn.Thisgeneratesthetradeoffbetweensatisfyingtheeitherofthetwothresholds.(B)For\u00afFhi\u00afFhmax,\u00afFhi,n\u00afFhi,nmaxforalli2{1,...,N}.Thiscreatestherequirementthatallofthethresholdsmustbesatis\ufb01ed.(C)For\u00afF\u00afFmax,\u00afFh\u00afFhmaxforallh2H.Thiscreatestherequirementthatallofthehypothesesmustbesatis\ufb01ed.Using(1)and(2),wede\ufb01nethegeneralformsof\u00afFand\u00afFmaxusedinSections4.2,4.3,and4.4.Eachofthesesectionswillapplythisde\ufb01nitiontodifferentchoicesofFh,Gh,N,\u21b51,...,\u21b5N,and\uf8ff1,...,\uf8ffNtosolvetheirvariantsoftheproblem.Inthisde\ufb01nition,Xisaconstanttomake\u00afFhtobeinteger-valued,YisthecontributiontothemaximumvaluefromFhand\u21b5n,andZisthecontributiontothemaximumvaluefromGhand\uf8ffn.De\ufb01nition4.3(General\u00afFand\u00afFmax).\u00afFh,n(\u02c6S)4=\u21e3(\uf8ffn\uf8ffn1)Gh,\uf8ffn,\uf8ffn1(\u02c6S)\u2318Fh,\u21b5n,\u21b5n+1(\u02c6S)+Gh,\uf8ffn,\uf8ffn1(\u02c6S)(\u21b5n\u21b5n+1),\u00afFh(\u02c6S)4=XNXn=1240@Yj6=n(\uf8ffj\uf8ffj1)1A\u00afFh,n(\u02c6S)35,\u00afF(\u02c6S)4=Xh2H\u00afFh(\u02c6S),\u00afFmax4=|H|YZDe\ufb01nition4.4(MultipleThresholds).Tosolvethemultiplethresholdsversionoftheproblem,Fh,Gh,N,\u21b51,...,\u21b5N,and\uf8ff1,...,\uf8ffNareusedwithoutmodi\ufb01cation.Theconstantsaresetasthefollowing:X=DFDNG,Y=DF\u21b51,Z=DNGNYn=1(\uf8ffn\uf8ffn1)Thisde\ufb01nitionof\u00afFtradesoffbetweenexploitation(maximizingthemostplausibleFh)andex-ploration(distinguishingbetweenmoreandlessplausibleFh)byallowingeach\u00afFitoreachitsmaximumvalueeitherbyhavingFhreach\u21b5iorhavingGhreach\uf8ffi.Inotherwords,eachofthethresholdscanbesatis\ufb01edwitheitherasuf\ufb01cientlylargeutilityFhorasuf\ufb01cientlylargedistanceGh.Figure2showsthelogicalrelationshipsbetweenthesecomponents.WeproveinAppendixAthat\u00afFismonotonesubmodular,andthat\ufb01ndingaSsuchthat\u00afF(S)\u00afFmaxisequivalenttosolvingProblem1.ForDe\ufb01nition4.4,wealsorequirethat\u21b5nand\uf8ffnthresh-oldssatisfyCondition4.5for\u00afFtobesubmodular.Condition4.5.Thesequenceh\u21b5i\u21b5n+1\uf8ffn\uf8ffn1iNi=1isnon-increasing.Theorem4.6.LetFhandGhbemonotonesubmodularandrational-valuedwith\ufb01xeddenominatorDFandDG,respectively.Then,ifCondition4.5holds,thenapplyingAlgorithm1using\u00afFand\u00afFmaxfromDe\ufb01nition4.4solvesthemultiplethresholdsversionofProblem1withcostatmost\u21e31+ln\u21e3|H|DFDNG\u21b51QNn=1(\uf8ffn\uf8ffn1)\u2318\u2318GCC.5\f4.3 Approximate Thresholds for Real-Valued Functions\n\nSolving even non-interactive submodular set cover is extremely challenging when the utility func-\ntions Fh are real-valued. For example, Appendix B.1 describes a setting where the greedy algorithm\nperforms arbitrarily poorly. We now extend the results from Section 4.2 to real-valued Fh and\n\u03b11, . . . , \u03b1N .\nRather than trying to solve the problem exactly, we instead solve a relaxed or approximate version,\nwhich will be useful for the convex threshold curve setting. Let \u0001 > 0 denote a pre-speci\ufb01ed\napproximation tolerance for Fh, (cid:100)\u00b7(cid:101)\u03b3 denote rounding up to the nearest multiple of \u03b3, and (cid:98)\u00b7(cid:99)\u03b3\ndenote rounding down to the nearest multiple of \u03b3. We de\ufb01ne a surrogate problem:\nDe\ufb01nition 4.7 (Approximate Thresholds for Real-Valued Functions). De\ufb01ne the following approx-\nimations to Fh and \u03b1n:\n\n\u0001\nD\n\n| \u02c6S|(cid:88)\n(cid:34)\nn(cid:88)\n\ni=1\n\ni=1\n\nD\n\u0001\n\n\uf8ee\uf8ef\uf8ef\uf8efFh( \u02c6S) +\n(cid:36)\n\uf8ee\uf8f0 |Q|(cid:88)\n\n\u03b1n \u2212 \u0001\nD\n\nD\n\u0001\n\n(|Q| + 1 \u2212 i) +\n\n(cid:48)\nh( \u02c6S)\n\nF\n\n(cid:52)\n=\n\n(cid:48)\nn\n\n\u03b1\n\n(cid:52)\n=\n\n(cid:52)\n=\n\nD\n\n(|Q| + 1 \u2212 i)\n\n(cid:35)(cid:37)\n\n\u0001\nD\n\n(2N \u2212 2i)DN\u2212i+1\n\nG\n\n(\u03baj \u2212 \u03baj\u22121)\n\n(cid:34)\n\nN(cid:88)\n\nj=i\n\n(2N \u2212 2i)DN\u2212i+1\n\nG\n\nN(cid:89)\n\n(\u03baj \u2212 \u03baj\u22121)\n\n(cid:35)\n\n\uf8f9\uf8fb\n\n+ 2\n\ni=1\n\ni=1\n\nj=i\n\nInstantiate \u00afF and \u00afFmax in De\ufb01nition 4.3 using F (cid:48)h, \u03b1(cid:48)n above, Gh, \u03ban and:\n\nC \u00afF = DN\n\nG , CF = \u03b1\n\n(cid:48)\n1, CG = DN\nG\n\n(\u03ban \u2212 \u03ban\u22121).\n\n\uf8f9\uf8fa\uf8fa\uf8fa \u0001\n\nD\n\n,\n\nN(cid:89)\n\nN(cid:89)\n\nn=1\n\nWe prove in Appendix B that De\ufb01nition 4.7 is an instance of a smooth ISSC problem, and that\nsolving De\ufb01nition 4.7 will approximately solve the original real-valued smooth ISSC problem.\nTheorem 4.8. Given Condition 4.5, Algorithm 1 using De\ufb01nition 4.7 will approximately solve\nthe real-valued multiple thresholds version of Problem 1 with tolerance \u0001 using cost at most\n\n(cid:16)1 + ln(cid:16)|H|\u03b1(cid:48)1DN\n\nG(cid:81)N\n\nn=1(\u03ban \u2212 \u03ban\u22121)(cid:17)(cid:17) GCC.\n\nWe show in Appendix B.2 how to apply this result to approximately solve the basic submodular set\ncover problem with real-valued objectives. Note that if \u0001 is selected as the smallest distinct difference\nbetween values in Fh, then the approximation will be exact.\n\n4.4 Convex Threshold Curve Version\nWe now address the setting where the threshold curve \u03b1(\u00b7) is continuous and convex. We again\nsolve the approximate version, since the threshold curve \u03b1(\u00b7) is necessarily real-valued. Let \u0001 > 0\nbe the pre-speci\ufb01ed tolerance for F (cid:48)h. Let N be de\ufb01ned so that N DG is the maximal value of Gh.\nWe convert the continuous version \u03b1(\u00b7) to a multiple threshold version (with N thresholds) that is\nwithin an \u0001-approximation of the former, as shown below.\nDe\ufb01nition 4.9 (Equivalent Multiple Thresholds for Continuous Convex Curve). Instantiate \u00afF and\n\u00afFmax in De\ufb01nition 4.3 using Gh without modi\ufb01cation, and a sequence of thresholds:\n\n\uf8ee\uf8ef\uf8ef\uf8efFh( \u02c6S) +\n(cid:36)\n\n\u03b1(n) \u2212 \u0001\nD\n\n\u0001\nD\n\n| \u02c6S|(cid:88)\n(cid:34)\nn(cid:88)\n\ni=1\n\n(cid:48)\nh( \u02c6S)\n\nF\n\n(cid:52)\n=\n\n(cid:48)\nn\n\n\u03b1\n\n(cid:52)\n=\n\nD\n\u0001\n\nD\n\u0001\n\n(|Q| + 1 \u2212 i)\n\n,\n\n\uf8f9\uf8fa\uf8fa\uf8fa \u0001\n\nD\n\n(2N \u2212 2i)DN\u2212i+1\n\nG\n\ni=1\n\nj=i\n\nN(cid:89)\n\n(\u03baj \u2212 \u03baj\u22121)\n\n(cid:35)(cid:37)\n\n\u0001\nD\n\n(cid:52)\n= DGn\n\n\u03ban\n\n6\n\n\fwith constants set as:\n\nC \u00afF = 1, CF = \u03b1\n\n(cid:48)\n1, CG = DN\nG\n\nN(cid:89)\n\n(\u03ban \u2212 \u03ban\u22121) = DN\nG .\n\nn=1\n\nNote that the F (cid:48)h are not too expensive to compute. We prove in Appendix C that satisfying this set of\nthresholds is equivalent to satisfying the original curve \u03b1(\u00b7) within \u0001-error. Note also that De\ufb01nition\n4.9 uses the same form as De\ufb01nition 4.7 to handle the approximation of real-valued functions.\nTheorem 4.10. Applying Algorithm 1 using De\ufb01nition 4.9 approximately solves the convex thresh-\nold version of Problem 1 with tolerance \u0001 using cost at most:(cid:0)1 + ln(cid:0)|H|\u03b1(cid:48)1DN\n\nNote that if \u0001 is suf\ufb01ciently large, then N could in principle be smaller, which can lead to less\nconservative approximations. There may also be more precise approximations by reducing to other\nformulations for the multi-threshold setting (e.g., Appendix D.2).\n\nG(cid:1)(cid:1) GCC.\n\n5 Simulation Experiments\n\nComparison of Methods to Solve Multiple Thresholds. We compared our multiple threshold\nmethod against multiple baselines (see Appendix D for more details) in a range of simulation settings\n(see Appendix E.1). Figure 4 shows the results. We see that our approach is consistently amongst the\nbest performing methods. The primary competitor is the circuit of constraints approach from [11]\n(see Appendix D.3 for a comparison of the theoretical guarantees). We also note that all approaches\ndramatically outperform their worst-case guarantees.\n\nFigure 4: Comparison against baselines in three simulation settings.\n\nValidating Approximation Tolerances. We also validated the ef\ufb01cacy of our approximate thresh-\nolds relaxation (see Appendix E.2 for more details of the setup). Figure 5 shows the results. We see\nthat the actual deviation from the original smooth ISSC problem is much smaller than the speci\ufb01ed\n\u0001, which suggests that our guarantees are rather conservative. For instance, at \u0001 = 15, the algorithm\nis allowed to terminate immediately. We also see that the cost to completion steadily decreases as \u0001\nincreases, which agrees with our theoretical results.\n\nFigure 5: Comparing cost and deviation from the exact function for varying \u0001.\n\n6 Summary of Results & Discussion\n\nFigure 6 summarizes the size of \u00afFmax (or \u00afF (cid:48)max for real-valued functions) for the various settings.\nRecall that our cost guarantees take the form (1 + ln \u00afFmax)OP T . When Fh are real-valued, then\nwe instead solve the smooth ISSC problem approximately with cost guarantee (1 + ln \u00afF (cid:48)max)OP T .\nOur results are well developed for many different versions of the utility functions Fh, but are less\n\ufb02exible for the distance functions Gh. For example, even for rational-valued Gh, \u00afFmax scales as\nG , which is not desirable. The restriction of Gh to be rational (or integral) leads to a relatively\nDN\nstraightforward reduction of the continuous convex version of \u03b1(\u00b7) to a multiple thresholds version.\n\n7\n\nPercentile050100Cost253035404550Cost for Setting APercentile050100Cost20253035Cost for Setting BPercentile050100Cost20253035Cost for Setting CMultiple Threshold (Def 4.4)Alternative (Def D.1)Circuit (Def D.6)Forward (Sec D.1)Backward (Sec D.1)0510152025Cost2628303234Cost vs 00510152025Deviation00.511.52Deviation vs 0\fIn fact, our formulation can be extended to deal with real-valued Gh and \u03ban in the multiple thresh-\nolds version; however the resulting \u00afF is no longer guaranteed to be submodular. It is possible that a\ndifferent assumption than the one imposed in Condition 4.5 is required to prove more general results.\n\nG\n\nF\nRational Rational\nReal\nRational\n\nMultiple Thresholds\n|H|\u03b11DF DN\n|H|\u03b1(cid:48)\n\n(cid:81)N\n(cid:81)N\ni=1(\u03bai \u2212 \u03bai\u22121)\ni=1 (\u03bai \u2212 \u03bai\u22121)\n\n1DN\nG\n\nG\n\nConvex Threshold Curve\n|H|\u03b11DF DN\n|H|\u03b1(cid:48)\n\n1DN\nG\n\nG\n\nFigure 6: Summarizing \u00afFmax. When Fh are real-valued, we show \u00afF (cid:48)max instead.\n\nOur analysis appears to be overly conservative for many settings. For instance, all the approaches we\nevaluated empirically achieved much better performance than their worst-case guarantees. It would\nbe interesting to identify ways to constrain the problem and develop tighter theoretical guarantees.\n\n7 Other Related Work\nSubmodular optimization is an important problem that arises across many settings, including sensor\nplacements [16, 15], summarization [26, 17, 23], inferring latent in\ufb02uence networks [8], diversi\ufb01ed\nrecommender systems [5, 4, 25, 21], and multiple solution prediction [1, 3, 22, 19]. However, the\nmajority of previous work has focused on of\ufb02ine submodular optimization whereby the submodular\nfunction to be optimized is \ufb01xed a priori (i.e., does not vary depending on feedback).\nThere are two typical ways that a submodular optimization problem can be made interactive. The\n\ufb01rst is in online submodular optimization, where an unknown submodular function must be re-\noptimized repeatedly over many sessions in an online or repeated-games fashion [20, 25, 21]. In\nthis setting, feedback is typically provided only at the conclusion of a session, and so adapting from\nfeedback is performed between sessions. In other words, each session consists of a non-interactive\nsubmodular optimization problem, and the technical challenge stems from the fact that the submod-\nular function is unknown a priori and must be learned from feedback provided post optimization in\neach session \u2013 this setting is often referred to as inter-session interactive optimization.\nThe other way to make submodular optimization interactive, which we consider in this paper, is to\nmake feedback available immediately after each action taken. In this way, one can simultaneously\nlearn about and optimize for the unknown submodular function within a single optimization session\n\u2013 this setting is often referred to as intra-session interactive optimization. One can also consider\nsettings that allow for both intra-session and inter-session interactive optimization.\nPerhaps the most well-studied application of intra-session interactive submodular optimization is\nactive learning [10, 7, 11, 9, 2, 14, 13], where the goal is to quickly reduce the hypothesis class\nto some target residual uncertainty for planning or decision making. Many instances of noisy and\napproximate active learning can be formulated as an interactive submodular set cover problem [9].\nA related setting is adaptive submodularity [7, 2, 6, 13], which is a probabilistic setting that essen-\ntially requires that the conditional expectation over the hypothesis set of submodular functions is\nitself a submodular function. In contrast, we require that the hypothesis class be pointwise submod-\nular (i.e., each hypothesis corresponds to a different submodular utility function). Although neither\nadaptive submodularity nor pointwise submodularity is a strict generalization of the other (cf. [7, 9]),\nin practice it can often be easier to model application settings using pointwise submodularity.\nThe \u201c\ufb02ipped\u201d problem is to maximize utility with a bounded budget, which is commonly known as\nthe budgeted submodular maximization problem [18]. Interactive budgeted maximization has been\nanalyzed rigorously for adaptive submodular problems [7], but it remains a challenge to develop\nprovably near-optimal interactive algorithms for pointwise submodular utility functions.\n8 Conclusions\nWe introduced smooth interactive submodular set cover, a smoothed generalization of previous ISSC\nframeworks. Smooth ISSC allows for the target threshold to vary based on the plausibility of the\nhypothesis. Smooth ISSC also introduces an approximate threshold solution concept that can be\napplied to real-valued functions, which also applies to basic submodular set cover with real-valued\nobjectives. We developed the \ufb01rst provably near-optimal algorithm for this setting.\n\n8\n\n\fReferences\n[1] Dhruv Batra, Payman Yadollahpour, Abner Guzman-Rivera, and Gregory Shakhnarovich. Diverse m-best\n\nsolutions in markov random \ufb01elds. In European Conference on Computer Vision (ECCV), 2012.\n\n[2] Yuxin Chen and Andreas Krause. Near-optimal batch mode active learning and adaptive submodular\n\noptimization. In International Conference on Machine Learning (ICML), 2013.\n\n[3] Debadeepta Dey, Tommy Liu, Martial Hebert, and J. Andrew Bagnell. Contextual sequence prediction\n\nvia submodular function optimization. In Robotics: Science and Systems Conference (RSS), 2012.\n\n[4] Khalid El-Arini and Carlos Guestrin. Beyond keyword search: discovering relevant scienti\ufb01c literature.\n\nIn ACM Conference on Knowledge Discovery and Data Mining (KDD), 2011.\n\n[5] Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin. Turning down the noise in the blogo-\n\nsphere. In ACM Conference on Knowledge Discovery and Data Mining (KDD), 2009.\n\n[6] Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, and S. Muthukrishnan. Adaptive sub-\n\nmodular maximization in bandit setting. In Neural Information Processing Systems (NIPS), 2013.\n\n[7] Daniel Golovin and Andreas Krause. Adaptive submodularity: A new approach to active learning and\n\nstochastic optimization. In Conference on Learning Theory (COLT), 2010.\n\n[8] Manuel Gomez Rodriguez, Jure Leskovec, and Andreas Krause.\n\nInferring networks of diffusion and\n\nin\ufb02uence. In ACM Conference on Knowledge Discovery and Data Mining (KDD), 2010.\n\n[9] Andrew Guillory. Active Learning and Submodular Functions. PhD thesis, University of Washington,\n\n2012.\n\n[10] Andrew Guillory and Jeff Bilmes.\n\nMachine Learning (ICML), 2010.\n\nInteractive submodular set cover.\n\nIn International Conference on\n\n[11] Andrew Guillory and Jeff Bilmes. Simultaneous learning and covering with adversarial noise. In Inter-\n\nnational Conference on Machine Learning (ICML), 2011.\n\n[12] Steve Hanneke. The complexity of interactive machine learning. Master\u2019s thesis, Carnegie Mellon Uni-\n\nversity, 2007.\n\n[13] Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, J. Andrew Bagnell, and Siddhartha Srini-\nvasa. Near optimal bayesian active learning for decision making. In Conference on Arti\ufb01cial Intelligence\nand Statistics (AISTATS), 2014.\n\n[14] Shervin Javdani, Matthew Klingensmith, J. Andrew Bagnell, Nancy Pollard, and Siddhartha Srinivasa.\nEf\ufb01cient touch based localization through submodularity. In IEEE International Conference on Robotics\nand Automation (ICRA), 2013.\n\n[15] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-optimal sensor placements in gaussian processes.\n\nIn International Conference on Machine Learning (ICML), 2005.\n\n[16] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie\nGlance. Cost-effective outbreak detection in networks. In ACM Conference on Knowledge Discovery and\nData Mining (KDD), 2007.\n\n[17] Hui Lin and Jeff Bilmes. Learning mixtures of submodular shells with application to document summa-\n\nrization. In Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), 2012.\n\n[18] George Nemhauser, Laurence Wolsey, and Marshall Fisher. An analysis of approximations for maximiz-\n\ning submodular set functions. Mathematical Programming, 14(1):265\u2013294, 1978.\n\n[19] Adarsh Prasad, Stefanie Jegelka, and Dhruv Batra. Submodular meets structured: Finding diverse subsets\n\nin exponentially-large structured item sets. In Neural Information Processing Systems (NIPS), 2014.\n\n[20] Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. Learning diverse rankings with multi-armed\n\nbandits. In International Conference on Machine Learning (ICML), 2008.\n\n[21] Karthik Raman, Pannaga Shivaswamy, and Thorsten Joachims. Online learning to diversify from implicit\n\nfeedback. In ACM Conference on Knowledge Discovery and Data Mining (KDD), 2012.\n\n[22] Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, and J. Andrew Bagnell. Learning policies for\n\ncontextual submodular prediction. In International Conference on Machine Learning (ICML), 2013.\n\n[23] Sebastian Tschiatschek, Rishabh Iyer, Haochen Wei, and Jeff Bilmes. Learning mixtures of submodular\nfunctions for image collection summarization. In Neural Information Processing Systems (NIPS), 2014.\n[24] Laurence A Wolsey. An analysis of the greedy algorithm for the submodular set covering problem.\n\nCombinatorica, 2(4):385\u2013393, 1982.\n\n[25] Yisong Yue and Carlos Guestrin. Linear submodular bandits and their application to diversi\ufb01ed retrieval.\n\nIn Neural Information Processing Systems (NIPS), 2011.\n\n[26] Yisong Yue and Thorsten Joachims. Predicting diverse subsets using structural svms. In International\n\nConference on Machine Learning (ICML), 2008.\n\n9\n\n\f", "award": [], "sourceid": 73, "authors": [{"given_name": "Bryan", "family_name": "He", "institution": "Stanford University"}, {"given_name": "Yisong", "family_name": "Yue", "institution": "Caltech"}]}