{"title": "Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach", "book": "Advances in Neural Information Processing Systems", "page_first": 4557, "page_last": 4566, "abstract": "We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm\u2019s memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing thresholding rule. STAR-T makes one pass over the data and retains a short but robust summary. We show that after the removal of any m elements from the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the remaining elements achieves a constant-factor approximation guarantee. In two different data summarization tasks, we demonstrate that it matches or outperforms existing greedy and streaming methods, even if they are allowed the benefit of knowing the removed subset in advance.", "full_text": "Streaming Robust Submodular Maximization:\n\nA Partitioned Thresholding Approach\n\nSlobodan Mitrovi\u00b4c\u2217\n\nEPFL\n\nIlija Bogunovic\u2020\n\nEPFL\n\nAshkan Norouzi-Fard\u2021\n\nEPFL\n\nJakub Tarnawski\u00a7\n\nEPFL\n\nVolkan Cevher\u00b6\n\nEPFL\n\nAbstract\n\nWe study the classical problem of maximizing a monotone submodular function\nsubject to a cardinality constraint k, with two additional twists: (i) elements arrive\nin a streaming fashion, and (ii) m items from the algorithm\u2019s memory are removed\nafter the stream is \ufb01nished. We develop a robust submodular algorithm STAR-T.\nIt is based on a novel partitioning structure and an exponentially decreasing thresh-\nolding rule. STAR-T makes one pass over the data and retains a short but robust\nsummary. We show that after the removal of any m elements from the obtained\nsummary, a simple greedy algorithm STAR-T-GREEDY that runs on the remaining\nelements achieves a constant-factor approximation guarantee. In two different\ndata summarization tasks, we demonstrate that it matches or outperforms existing\ngreedy and streaming methods, even if they are allowed the bene\ufb01t of knowing the\nremoved subset in advance.\n\n1\n\nIntroduction\n\nA central challenge in many large-scale machine learning tasks is data summarization \u2013 the extraction\nof a small representative subset out of a large dataset. Applications include image and document\nsummarization [1, 2], in\ufb02uence maximization [3], facility location [4], exemplar-based clustering [5],\nrecommender systems [6], and many more. Data summarization can often be formulated as the\nproblem of maximizing a submodular set function subject to a cardinality constraint.\nOn small datasets, a popular algorithm is the simple greedy method [7], which produces solutions\nprovably close to optimal. Unfortunately, it requires repeated access to all elements, which makes it\ninfeasible for large-scale scenarios, where the entire dataset does not \ufb01t in the main memory. In this\nsetting, streaming algorithms prove to be useful, as they make only a small number of passes over the\ndata and use sublinear space.\nIn many settings, the extracted representative set is also required to be robust. That is, the objective\nvalue should degrade as little as possible when some elements of the set are removed. Such removals\nmay arise for any number of reasons, such as failures of nodes in a network, or user preferences\nwhich the model failed to account for; they could even be adversarial in nature.\n\n\u2217e-mail: slobodan.mitrovic@ep\ufb02.ch\n\u2020e-mail: ilija.bogunovic@ep\ufb02.ch\n\u2021e-mail: ashkan.norouzifard@ep\ufb02.ch\n\u00a7e-mail: jakub.tarnawski@ep\ufb02.ch\n\u00b6e-mail: volkan.cevher@ep\ufb02.ch\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fA robustness requirement is especially challenging for large datasets, where it is prohibitively\nexpensive to reoptimize over the entire data collection in order to \ufb01nd replacements for the removed\nelements. In some applications, where data is produced so rapidly that most of it is not being stored,\nsuch a search for replacements may not be possible at all.\nThese requirements lead to the following two-stage setting. In the \ufb01rst stage, we wish to solve the\nrobust streaming submodular maximization problem \u2013 one of \ufb01nding a small representative subset of\nelements that is robust against any possible removal of up to m elements. In the second, query stage,\nafter an arbitrary removal of m elements from the summary obtained in the \ufb01rst stage, the goal is to\nreturn a representative subset, of size at most k, using only the precomputed summary rather than the\nentire dataset.\nFor example, (i) in dominating set problem (also studied under in\ufb02uence maximization) we want\nto ef\ufb01ciently (in a single pass) compute a compressed but robust set of in\ufb02uential users in a social\nnetwork (whom we will present with free copies of a new product), (ii) in personalized movie\nrecommendation we want to ef\ufb01ciently precompute a robust set of user-preferred movies. Once we\ndiscard those users who will not spread the word about our product, we should \ufb01nd a new set of\nin\ufb02uential users in the precomputed robust summary. Similarly, if some movies turn out not to be\ninteresting for the user, we should still be able to provide good recommendations by only looking\ninto our robust movie summary.\n\nContributions.\nIn this paper, we propose a two-stage procedure for robust submodular maximiza-\ntion. For the \ufb01rst stage, we design a streaming algorithm which makes one pass over the data\nand \ufb01nds a summary that is robust against removal of up to m elements, while containing at most\n\nO(cid:0)(m log k + k) log2 k(cid:1) elements.\n\nIn the second (query) stage, given any set of size m that has been removed from the obtained summary,\nwe use a simple greedy algorithm that runs on the remaining elements and produces a solution of\nsize at most k (without needing to access the entire dataset). We prove that this solution satis\ufb01es a\nconstant-factor approximation guarantee.\nAchieving this result requires novelty in the algorithm design as well as the analysis. Our streaming\nalgorithm uses a structure where the constructed summary is arranged into partitions consisting of\nbuckets whose sizes increase exponentially with the partition index. Moreover, buckets in different\npartitions are associated with greedy thresholds, which decrease exponentially with the partition index.\nOur analysis exploits and combines the properties of the described robust structure and decreasing\ngreedy thresholding rule.\nIn addition to algorithmic and theoretical contributions, we also demonstrate in several practical\nscenarios that our procedure matches (and in some cases outperforms) the SIEVE-STREAMING\nalgorithm [8] (see Section 5) \u2013 even though we allow the latter to know in advance which elements\nwill be removed from the dataset.\n\n2 Problem Statement\n\nWe consider a potentially large universe of elements V of size n equipped with a normalized monotone\nsubmodular set function f : 2V \u2192 R\u22650 de\ufb01ned on V . We say that f is monotone if for any two sets\nX \u2286 Y \u2286 V we have f (X) \u2264 f (Y ). The set function f is said to be submodular if for any two sets\nX \u2286 Y \u2286 V and any element e \u2208 V \\ Y it holds that\n\nf (X \u222a {e}) \u2212 f (X) \u2265 f (Y \u222a {e}) \u2212 f (Y ).\n\nWe use f (Y | X) to denote the marginal gain in the function value due to adding the elements of set\nY to set X, i.e. f (Y | X) := f (X \u222a Y ) \u2212 f (X). We say that f is normalized if f (\u2205) = 0.\nThe problem of maximizing a monotone submodular function subject to a cardinality constraint, i.e.,\n(1)\n\nf (Z),\n\nmax\n\nZ\u2286V,|Z|\u2264k\n\nhas been studied extensively. It is well-known that a simple greedy algorithm (henceforth refered to\nas GREEDY) [7], which starts from an empty set and then iteratively adds the element with highest\nmarginal gain, provides a (1 \u2212 e\u22121)-approximation. However, it requires repeated access to all\nelements of the dataset, which precludes it from use in large-scale machine learning applications.\n\n2\n\n\fWe say that a set S is robust for a parameter m if, for any set E \u2286 V such that |E| \u2264 m, there is a\nsubset Z \u2286 S \\ E of size at most k such that\n\nf (Z) \u2265 cf (OPT(k, V \\ E)),\n\nwhere c > 0 is an approximation ratio. We use OPT(k, V \\ E) to denote the optimal subset of size\nk of V \\ E (i.e., after the removal of elements in E):\n\nOPT(k, V \\ E) \u2208 argmax\n\nZ\u2286V \\E,|Z|\u2264k\n\nf (Z).\n\nIn this work, we are interested in solving a robust version of Problem (1) in the setting that consists\nof the following two stages: (i) streaming and (ii) query stage.\nIn the streaming stage, elements from the ground set V arrive in a streaming fashion in an arbitrary\norder. Our goal is to design a one-pass streaming algorithm that has oracle access to f and retains a\nsmall set S of elements in memory. In addition, we want S to be a robust summary, i.e., S should both\ncontain elements that maximize the objective value, and be robust against the removal of prespeci\ufb01ed\nnumber of elements m. In the query stage, after any set E of size at most m is removed from V , the\ngoal is to return a set Z \u2286 S \\ E of size at most k such that f (Z) is maximized.\nRelated work. A robust, non-streaming version of Problem (1) was \ufb01rst introduced in [9]. In that\nsetting, the algorithm must output a set Z of size k which maximizes the smallest objective value\nguaranteed to be obtained after a set of size m is removed, that is,\n\nmax\n\nZ\u2286V,|Z|\u2264k\n\nmin\n\nE\u2286Z,|E|\u2264m\n\nf (Z \\ E).\n\n\u221a\nThe work [10] provides the \ufb01rst constant (0.387) factor approximation result to this problem, valid\nfor m = o(\nk). Their solution consists of buckets of size O(m2 log k) that are constructed greedily,\none after another. Recently, in [11], a centralized algorithm PRO has been proposed that achieves the\nsame approximation result and allows for a greater robustness m = o(k). PRO constructs a set that is\narranged into partitions consisting of buckets whose sizes increase exponentially with the partition\nindex. In this work, we use a similar structure for the robust set but, instead of \ufb01lling the buckets\ngreedily one after another, we place an element in the \ufb01rst bucket for which the gain of adding the\nelement is above the corresponding threshold. Moreover, we introduce a novel analysis that allows us\nto be robust to any number of removals m as long as we are allowed to use O(m log2 k) memory.\nRecently, submodular streaming algorithms (e.g. [5], [12] and [13]) have become a prominent\noption for scaling submodular optimization to large-scale machine learning applications. A popular\nsubmodular streaming algorithm SIEVE-STREAMING [8] solves Problem (1) by performing one pass\nover the data, and achieves a (0.5 \u2212 \u0001)-approximation while storing at most O\nOur algorithm extends the algorithmic ideas of SIEVE-STREAMING, such as greedy thresholding, to\nthe robust setting. In particular, we introduce a new exponentially decreasing thresholding scheme\nthat, together with an innovative analysis, allows us to obtain a constant-factor approximation for the\nrobust streaming problem.\nRecently, robust versions of submodular maximization have been considered in the problems of\nin\ufb02uence maximization (e.g, [3], [14]) and budget allocation ([15]). Increased interest in interactive\nmachine learning methods has also led to the development of interactive and adaptive submodular\noptimization (see e.g. [16], [17]). Our procedure also contains the interactive component, as we can\ncompute the robust summary only once and then provide different sub-summaries that correspond to\nmultiple different removals (see Section 5.2).\nIndependently and concurrently with our work, [18] gave a streaming algorithm for robust submodular\nmaximization under the cardinality constraint. Their approach provides a 1/2 \u2212 \u03b5 approximation\nguarantee. However, their algorithm uses O(mk log k/\u03b5) memory. While the memory requirement\nof their method increases linearly with k, in the case of our algorithm this dependence is logarithmic.\n\n(cid:16) k log k\n\n(cid:17)\n\nelements.\n\n\u0001\n\n3\n\n\fFigure 1: Illustration of the set S returned by STAR-T. It consists of (cid:100)log k(cid:101) + 1 partitions such that\neach partition i contains w(cid:100)k/2i(cid:101) buckets of size 2i (up to rounding). Moreover, each partition i has\nits corresponding threshold \u03c4 /2i.\n\n3 A Robust Two-stage Procedure\n\nmemory parameter that depends on m; we use w \u2265 (cid:108) 4(cid:100)log k(cid:101)m\n\nOur approach consists of the streaming Algorithm 1, which we call Streaming Robust submodular\nalgorithm with Partitioned Thresholding (STAR-T). This algorithm is used in the streaming stage,\nwhile Algorithm 2, which we call STAR-T-GREEDY, is used in the query stage.\nAs the input, STAR-T requires a non-negative monotone submodular function f, cardinality\nconstraint k, robustness parameter m and thresholding parameter \u03c4. The parameter \u03c4 is an \u03b1-\napproximation to f (OPT(k, V \\ E)), for some \u03b1 \u2208 (0, 1] to be speci\ufb01ed later. Hence, it depends on\nf (OPT(k, V \\ E)), which is not known a priori. For the sake of clarity, we present the algorithm\nas if f (OPT(k, V \\ E)) were known, and in Section 4.1 we show how f (OPT(k, V \\ E)) can be\napproximated. The algorithm makes one pass over the data and outputs a set of elements S that is\nlater used in the query stage in STAR-T-GREEDY.\nThe set S (see Figure 1 for an illustration) is divided into (cid:100)log k(cid:101) + 1 partitions, where every partition\ni \u2208 {0, . . . ,(cid:100)log k(cid:101)} consists of w(cid:100)k/2i(cid:101) buckets Bi,j, j \u2208 {1, . . . , w(cid:100)k/2i(cid:101)}. Here, w \u2208 N+ is a\nin our asymptotic theory, while\nour numerical results show that w = 1 works well in practice. Every bucket Bi,j stores at most\nmin{k, 2i} elements. If |Bi,j| = min{2i, k}, then we say that Bi,j is full.\nEvery partition has a corresponding threshold that is exponentially decreasing with the partition index\ni as \u03c4 /2i. For example, the buckets in the \ufb01rst partition will only store elements that have marginal\nvalue at least \u03c4. Every element e \u2208 V arriving on the stream is assigned to the \ufb01rst non-full bucket\nBi,j for which the marginal value f (e | Bi,j) is at least \u03c4 /2i. If there is no such bucket, the element\nwill not be stored. Hence, the buckets are disjoint sets that in the end (after one pass over the data) can\nhave a smaller number of elements than speci\ufb01ed by their corresponding cardinality constraints, and\nsome of them might even be empty. The set S returned by STAR-T is the union of all the buckets.\nIn the second stage, STAR-T-GREEDY receives as input the set S constructed in the streaming stage,\na set E \u2282 S that we think of as removed elements, and the cardinality constraint k. The algorithm\nthen returns a set Z, of size at most k, that is obtained by running the simple greedy algorithm\nGREEDY on the set S \\ E. Note that STAR-T-GREEDY can be invoked for different sets E.\n\n(cid:109)\n\nk\n\n4 Theoretical Bounds\n\nIn this section we discuss our main theoretical results. We initially assume that the value\nf (OPT(k, V \\ E)) is known; later, in Section 4.1, we remove this assumption. The more de-\ntailed versions of our proofs are given in the supplementary material. We begin by stating the main\nresult.\n\n4\n\nData Stream k \u0001 buckets (k / 2) \u0001 buckets 2 \u00011\u0001Set Spartitions\u0001\u0001 / 2\u0001 / kdecreasing thresholds\ffor all 0 \u2264 i \u2264 (cid:100)log k(cid:101) and 1 \u2264 j \u2264 w(cid:100)k/2i(cid:101)\n\nAlgorithm 1 STreAming Robust - Thresholding submodular algorithm (STAR-T)\nInput: Set V , k, \u03c4, w \u2208 N+\n1: Bi,j \u2190 \u2205\n2: for each element e in the stream do\n3:\n4:\n5:\n6:\n7:\n\nfor j \u2190 1 to w(cid:100)k/2i(cid:101) do\nBi,j \u2190 Bi,j \u222a {e}\nbreak: proceed to the next element in the stream\n\nif |Bi,j| < min{2i, k} and f (e | Bi,j) \u2265 \u03c4 / min{2i, k} then\n\nfor i \u2190 0 to (cid:100)log k(cid:101) do\n\n8: S \u2190(cid:83)\n\n9: return S\n\ni,j Bi,j\n\n(cid:46) loop over partitions\n(cid:46) loop over buckets\n\nAlgorithm 2 STAR-T- GREEDY\nInput: Set S, query set E and k\n1: Z \u2190 GREEDY(k, S \\ E)\n2: return Z\n\nGiven a cardinality constraint k and parameter m, for a setting of parameters w \u2265(cid:108) 4(cid:100)log k(cid:101)m\n\nTheorem 4.1 Let f be a normalized monotone submodular function de\ufb01ned over the ground set V .\nand\n\n(cid:109)\n\nk\n\n\u03c4 =\n\n2+\n\n1\n(1\u2212e\u22121)\n(1\u2212e\u22121/3)\n\n(cid:16)\n1\u2212 1(cid:100)log k(cid:101)\n\n(cid:17) f (OPT(k, V \\ E)),\n\nSTAR-T performs a single pass over the data set and constructs a set S of size at most O((k +\nm log k) log k) elements.\nFor such a set S and any set E \u2286 V such that |E| \u2264 m, STAR-T-GREEDY yields a set Z \u2286 S \\ E\nof size at most k with\n\nf (Z) \u2265 c \u00b7 f (OPT(k, V \\ E)),\n\nfor c = 0.149\n\n1 \u2212 1(cid:100)log k(cid:101)\n\n. Therefore, as k \u2192 \u221e, the value of c approaches 0.149.\n\n(cid:16)\n\n(cid:17)\n\nf (Z) \u2265(cid:0)1 \u2212 e\u22121(cid:1)(cid:18)\n\n(cid:19)\n\n1 \u2212 4m\nwk\n\nProof sketch. We \ufb01rst consider the case when there is a partition i(cid:63) in S such that at least half\nof its buckets are full. We show that there is at least one full bucket Bi(cid:63),j such that f (Bi(cid:63),j \\ E)\nis only a constant factor smaller than f (OPT(k, V \\ E)), as long as the threshold \u03c4 is set close to\nf (OPT(k, V \\ E)). We make this statement precise in the following lemma:\nLemma 4.2 If there exists a partition in S such that at least half of its buckets are full, then for the\nset Z produced by STAR-T-GREEDY we have\n\n\u03c4.\n\n(2)\n\nTo prove this lemma, we \ufb01rst observe that from the properties of GREEDY it follows that\n\nf (Z) = f (GREEDY(k, S \\ E)) \u2265(cid:0)1 \u2212 e\u22121(cid:1) f (Bi(cid:63),j \\ E) .\n\nNow it remains to show that f (Bi(cid:63),j \\ E) is close to \u03c4. We observe that for any full bucket Bi(cid:63),j, we\nhave |Bi(cid:63),j| = min{2i, k}, so its objective value f (Bi(cid:63),j) is at least \u03c4 (every element added to this\nbucket increases its objective value by at least \u03c4 / min{2i, k}). On average, |Bi(cid:63),j \u2229 E| is relatively\nsmall, and hence we can show that there exists some full bucket Bi(cid:63),j such that f (Bi(cid:63),j \\ E) is close\nto f (Bi(cid:63),j).\nNext, we consider the other case, i.e., when for every partition, more than half of its buckets are not\nfull after the execution of STAR-T. For every partition i, we let Bi denote a bucket that is not fully\npopulated and for which |Bi \u2229 E| is minimized over all the buckets of that partition. Then, we look\nat such a bucket in the last partition: B(cid:100)log k(cid:101).\nWe provide two lemmas that depend on f (B(cid:100)log k(cid:101)). If \u03c4 is set to be small compared to f (OPT(k, V \\\nE)):\n\n5\n\n\fwithin a constant factor of f (OPT(k, V \\ E));\n\n\u2022 Lemma 4.3 shows that if f (B(cid:100)log k(cid:101)) is close to f (OPT(k, V \\ E)), then our solution is\n\u2022 Lemma 4.4 shows that if f (B(cid:100)log k(cid:101)) is small compared to f (OPT(k, V \\ E)), then our\n\nsolution is again within a constant factor of f (OPT(k, V \\ E)).\n\nLemma 4.3 If there does not exist a partition of S such that at least half of its buckets are full, then\nfor the set Z produced by STAR-T-GREEDY we have\n\nf (Z) \u2265(cid:16)\n\n1 \u2212 e\u22121/3(cid:17)(cid:18)\n\nf(cid:0)B(cid:100)log k(cid:101)(cid:1) \u2212 4m\n\n(cid:19)\n\nwhere B(cid:100)log k(cid:101) is a not-fully-populated bucket in the last partition that minimizes(cid:12)(cid:12)B(cid:100)log k(cid:101) \u2229 E(cid:12)(cid:12) and\n\n|E| \u2264 m.\nUsing standard properties of submodular functions and the GREEDY algorithm we can show that\n\nwk\n\n\u03c4\n\n,\n\nf (Z) = f (GREEDY(k, S \\ E)) \u2265(cid:16)\n\n1 \u2212 e\u22121/3(cid:17)(cid:18)\n\nf(cid:0)B(cid:100)log k(cid:101)(cid:1) \u2212 4m\n\n(cid:19)\n\n\u03c4\n\n.\n\nwk\n\nThe complete proof of this result can be found in Lemma B.2, in the supplementary material.\n\nLemma 4.4 If there does not exist a partition of S such that at least half of its buckets are full, then\nfor the set Z produced by STAR-T-GREEDY,\n\nf (Z) \u2265 (1 \u2212 e\u22121)(cid:0)f (OP T (k, V \\ E)) \u2212 f (B(cid:100)log k(cid:101)) \u2212 \u03c4(cid:1),\n\nwhere B(cid:100)log k(cid:101) is any not-fully-populated bucket in the last partition.\n\nTo prove this lemma, we look at two sets X and Y , where Y contains all the elements from\nOPT(k, V \\ E) that are placed in the buckets that precede bucket B(cid:100)log k(cid:101) in S, and set X :=\nOPT(k, V \\ E) \\ Y . By monotonicity and submodularity of f, we bound f (Y ) by:\n\nf (Y ) \u2265 f (OPT(k, V \\ E)) \u2212 f (X) \u2265 f (OPT(k, V \\ E)) \u2212 f(cid:0)B(cid:100)log k(cid:101)(cid:1) \u2212(cid:88)\nf(cid:0)e(cid:12)(cid:12) B(cid:100)log k(cid:101)(cid:1) .\nTo bound the sum on the right hand side we use that for every e \u2208 X we have f(cid:0)e(cid:12)(cid:12) B(cid:100)log k(cid:101)(cid:1) < \u03c4\nWe conclude the proof by showing that f (Z) = f (GREEDY(k, S \\ E)) \u2265(cid:0)1 \u2212 e\u22121(cid:1) f (Y ).\n\nwhich holds due to the fact that B(cid:100)log k(cid:101) is a bucket in the last partition and is not fully populated.\n\ne\u2208X\n\nk ,\n\n|S| =\n\nEquipped with the above results, we proceed to prove our main result.\nProof of Theorem 4.1. First, we prove the bound on the size of S:\n\n(cid:100)log k(cid:101)(cid:88)\nBy setting w \u2265(cid:108) 4(cid:100)log k(cid:101)m\nwk , \u03b11 := (cid:0)1 \u2212 e\u22121/3(cid:1), and\n\u03b12 := (cid:0)1 \u2212 e\u22121(cid:1). Lemma 4.3 and 4.4 provide two bounds on f (Z), one increasing and one\n\nNext, we show the approximation guarantee. We \ufb01rst de\ufb01ne \u03b3 := 4m\n\nw(k/2i + 1)2i \u2264 (log k + 5)wk.\n\nwe obtain S = O((k + m log k) log k).\n\nw(cid:100)k/2i(cid:101) min{2i, k} \u2264\n\n(cid:100)log k(cid:101)(cid:88)\n\n(cid:109)\n\n(3)\n\ni=0\n\ni=0\n\nk\n\ndecreasing in f (B(cid:100)log k(cid:101)). By balancing out the two bounds, we derive\n\n(cid:18) \u03b11\u03b12\n\n(cid:19)\n\n\u03b11 + \u03b12\n\nf (Z) \u2265\n\n(f (OPT(k, V \\ E)) \u2212 (1 + \u03b3)\u03c4 ),\n\n(4)\n\nwith equality for f (B(cid:100)log k(cid:101)) = \u03b12f (OPT(k,V \\E))\u2212(\u03b12\u2212\u03b3\u03b11)\u03c4\nNext, as \u03b3 \u2265 0, we can observe that Eq. (4) is decreasing, while the bound on f (Z) given by\nLemma 4.2 is increasing in \u03c4 for \u03b3 < 1. Hence, by balancing out the two inequalities, we obtain our\n\ufb01nal bound\n\n\u03b12+\u03b11\n\n.\n\nf (Z) \u2265\n\n1\n\n2\n\n\u03b12(1\u2212\u03b3) + 1\n\u03b11\n\nf (OPT(k, V \\ E)).\n\n(5)\n\n6\n\n\fFor w \u2265(cid:108) 4(cid:100)log k(cid:101)m\n\n(cid:109)\n\nk\n\nprove our main result:\n\nwe have \u03b3 \u2264 1/(cid:100)log k(cid:101), and hence, by substituting \u03b11 and \u03b12 in Eq. (5), we\n\nf (Z) \u2265\n\nf (OPT(k, V \\ E))\n\n(cid:0)1 \u2212 e\u22121/3(cid:1)(cid:0)1 \u2212 e\u22121(cid:1)(cid:16)\n2(cid:0)1 \u2212 e\u22121/3(cid:1) + (1 \u2212 e\u22121)\n(cid:19)\n(cid:18)\n\n1 \u2212 1(cid:100)log k(cid:101)\n\n(cid:17)\n\n\u2265 0.149\n\n1 \u2212 1\n\n(cid:100)log k(cid:101)\n\nf (OPT(k, V \\ E)).\n\n2\n\n4.1 Algorithm without access to f (OPT(k, V \\ E))\nAlgorithm STAR-T requires in its input a parameter \u03c4 which is a function of an unknown value\nf (OPT(k, V \\ E)). To deal with this shortcoming, we show how to extend the idea of [8] of\nmaintaining multiple parallel instances of our algorithm in order to approximate f (OPT(k, V \\ E)).\nFor a given constant \u0001 > 0, this approach increases the space by a factor of log1+\u0001 k and provides a\n(1 + \u0001)-approximation compared to the value obtained in Theorem 4.1. More precisely, we prove the\nfollowing theorem.\n\npass over the stream and outputs a collection of sets S of total size O(cid:0)(k + m log k) log k log1+\u0001 k(cid:1)\n\nTheorem 4.5 For any given constant \u0001 > 0 there exists a parallel variant of STAR-T that makes one\nwith the following property: There exists a set S \u2208 S such that applying STAR-T-GREEDY on S\nyields a set Z \u2286 S \\ E of size at most k with\n\n(cid:18)\n\n(cid:19)\n\nf (Z) \u2265 0.149\n1 + \u0001\n\n1 \u2212 1\n\n(cid:100)log k(cid:101)\n\nf (OPT(k, V \\ E)).\n\nThe proof of this theorem, along with a description of the corresponding algorithm, is provided in\nAppendix E.\n\n5 Experiments\n\nIn this section, we numerically validate the claims outlined in the previous section. Namely, we\ntest the robustness and compare the performance of our algorithm against the SIEVE-STREAMING\nalgorithm that knows in advance which elements will be removed. We demonstrate improved or\nmatching performance in two different data summarization applications: (i) the dominating set\nproblem, and (ii) personalized movie recommendation. We illustrate how a single robust summary\ncan be used to regenerate recommendations corresponding to multiple different removals.\n\n5.1 Dominating Set\n\nIn the dominating set problem, given a graph G = (V, M ), where V represents the set of nodes and\nM stands for edges, the objective function is given by f (Z) = |N (Z) \u222a Z|, where N (Z) denotes\nthe neighborhood of Z (all nodes adjacent to any node of Z). This objective function is monotone\nand submodular.\nWe consider two datasets: (i) ego-Twitter [19], consisting of 973 social circles from Twitter, which\nform a directed graph with 81306 nodes and 1768149 edges; (ii) Amazon product co-purchasing\nnetwork [20]: a directed graph with 317914 nodes and 1745870 edges.\nGiven the dominating set objective function, we run STAR-T to obtain the robust summary S. Then\nwe compare the performance of STAR-T-GREEDY, which runs on S, against the performance of\nSIEVE-STREAMING, which we allow to know in advance which elements will be removed. We\nalso compare against a method that chooses the same number of elements as STAR-T, but does\nso uniformly at random from the set of all elements that will not be removed (V \\ E); we refer to\nit as RANDOM. Finally, we also demonstrate the peformance of STAR-T-SIEVE, a variant of our\nalgorithm that uses the same robust summary S, but instead of running GREEDY in the second stage,\nit runs SIEVE-STREAMING on S \\ E.\n\n7\n\n\fFigure 2: Numerical comparisons of the algorithms STAR-T-GREEDY, STAR-T-SIEVE and SIEVE-\nSTREAMING.\nFigures 2(a,c) show the objective value after the random removal of k elements from the set S, for\ndifferent values of k. Note that E is sampled as a subset of the summary of our algorithm, which hurts\nthe performance of our algorithm more than the baselines. The reported numbers are averaged over\n100 iterations. STAR-T-GREEDY, STAR-T-SIEVE and SIEVE-STREAMING perform comparably\n(STAR-T-GREEDY slightly outperforms the other two), while RANDOM is signi\ufb01cantly worse.\nIn Figures 2(b,d) we plot the objective value for different values of k after the removal of 2k elements\nfrom the set S, chosen greedily (i.e., by iteratively removing the element that reduces the objective\nvalue the most). Again, STAR-T-GREEDY, STAR-T-SIEVE and SIEVE-STREAMING perform\ncomparably, but this time SIEVE-STREAMING slightly outperforms the other two for some values\nof k. We observe that even when we remove more than k elements from S, the performance of our\nalgorithm is still comparable to the performance of SIEVE-STREAMING (which knows in advance\nwhich elements will be removed). We provide additional results in the supplementary material.\n\n5.2\n\nInteractive Personalized Movie Recommendation\n\nThe next application we consider is personalized movie recommendation. We use the MovieLens\n1M database [21], which contains 1000209 ratings for 3900 movies by 6040 users. Based on these\nratings, we obtain feature vectors for each movie and each user by using standard low-rank matrix\ncompletion techniques [22]; we choose the number of features to be 30.\nFor a user u, we use the following monotone submodular function to recommend a set of movies Z:\n\nfu(Z) = (1 \u2212 \u03b1) \u00b7(cid:88)\n\n(cid:104)vu, vz(cid:105) + \u03b1 \u00b7 (cid:88)\n\nz\u2208Z\n\nm\u2208M\n\n(cid:104)vm, vz(cid:105) .\n\nmax\nz\u2208Z\n\nThe \ufb01rst term aggregates the predicted scores of the chosen movies z \u2208 Z for the user u (here vu\nand vz are non-normalized feature vectors of user u and movie z, respectively). The second term\ncorresponds to a facility-location objective that measures how well the set Z covers the set of all\nmovies M [4]. Finally, \u03b1 is a user-dependent parameter that speci\ufb01es the importance of global movie\ncoverage versus high scores of individual movies.\nHere, the robust setting arises naturally since we do not have complete information about the user:\nwhen shown a collection of top movies, it will likely turn out that they have watched (but not rated)\nmany of them, rendering these recommendations moot. In such an interactive setting, the user may\nalso require (or exclude) movies of a speci\ufb01c genre, or similar to some favorite movie.\nWe compare the performance of our algorithms STAR-T-GREEDY and STAR-T-SIEVE in such\nscenarios against two baselines: GREEDY and SIEVE-STREAMING (both being run on the set V \\ E,\ni.e., knowing the removed elements in advance). Note that in this case we are able to afford running\n\n8\n\nCardinalityk102030405060708090100Avg.obj.value020004000600080001000012000(a)Amazoncommunities,|E|=kStar-T-GreedyStar-T-SieveSieve-StrRandomCardinalityk102030405060708090100Avg.obj.value\u00d710400.511.522.5(c)ego-Twitter,|E|=kStar-T-GreedyStar-T-SieveSieve-StrRandomCardinalityk1030507090Obj.value0102030405060(e)Movies,already-seenStar-T-GreedyStar-T-SieveSieve-StrGreedyCardinalityk102030405060708090100Obj.value01000200030004000500060007000(b)Amazoncommunities,|E|=2kStar-T-GreedyStar-T-SieveSieve-StrRandomCardinalityk102030405060708090100Obj.value\u00d710400.511.52(d)ego-Twitter,|E|=2kStar-T-GreedyStar-T-SieveSieve-StrRandomCardinalityk1030507090110130150170190Obj.value102030405060(f)Movies,bygenreStar-T-GreedyStar-T-SieveSieve-StrGreedy\fGREEDY, which may be infeasible when working with larger datasets. Below we discuss two concrete\npractical scenarios featured in our experiments.\n\nMovies by genre. After we have built our summary S, the user decides to watch a drama today;\nwe retrieve only movies of this genre from S. This corresponds to removing 59% of the universe\nV . In Figure 2(f) we report the quality of our output compared to the baselines (for user ID 445\nand \u03b1 = 0.95) for different values of k. The performance of STAR-T-GREEDY is within several\npercent of the performance of GREEDY (which we can consider as a tractable optimum), and the two\nsieve-based methods STAR-T-SIEVE and SIEVE-STREAMING display similar objective values.\n\nAlready-seen movies. We randomly sample a set E of movies already watched by the user (500\nout of all 3900 movies). To obtain a realistic subset, each movie is sampled proportionally to its\npopularity (number of ratings). Figure 2(e) shows the performance of our algorithm faced with the\nremoval of E (user ID = 445, \u03b1 = 0.9) for a range of settings of k. Again, our algorithm is able to\nalmost match the objective values of GREEDY (which is aware of E in advance).\nRecall that we are able to use the same precomputed summary S for different removed sets E. This\nsummary was built for parameter w = 1, which theoretically allows for up to k removals. However,\ndespite having |E| (cid:29) k in the above scenarios, our performance remains robust; this indicates that\nour method is more resilient in practice than what the proved bound alone would guarantee.\n\n6 Conclusion\n\nWe have presented a new robust submodular streaming algorithm STAR-T based on a novel parti-\ntioning structure and an exponentially decreasing thresholding rule. It makes one pass over the data\n\nand retains a set of size O(cid:0)(k + m log k) log2 k(cid:1). We have further shown that after the removal of\n\nany m elements, a simple greedy algorithm that runs on the obtained set achieves a constant-factor\napproximation guarantee for robust submodular function maximization. In addition, we have pre-\nsented two numerical studies where our method compares favorably against the SIEVE-STREAMING\nalgorithm that knows in advance which elements will be removed.\n\nAcknowledgment.\nIB and VC\u2019s work was supported in part by the European Research Council\n(ERC) under the European Union\u2019s Horizon 2020 research and innovation program (grant agreement\nnumber 725594), in part by the Swiss National Science Foundation (SNF), project 407540_167319/1,\nin part by the NCCR MARVEL, funded by the Swiss National Science Foundation, in part by\nHasler Foundation Switzerland under grant agreement number 16066 and in part by Of\ufb01ce of Naval\nResearch (ONR) under grant agreement number N00014-16-R-BA01. JT\u2019s work was supported by\nERC Starting Grant 335288-OptApprox.\n\n9\n\n\fReferences\n[1] S. Tschiatschek, R. K. Iyer, H. Wei, and J. A. Bilmes, \u201cLearning mixtures of submodular\nfunctions for image collection summarization,\u201d in Advances in neural information processing\nsystems, 2014, pp. 1413\u20131421.\n\n[2] H. Lin and J. Bilmes, \u201cA class of submodular functions for document summarization,\u201d in Assoc.\n\nfor Comp. Ling.: Human Language Technologies-Volume 1, 2011.\n\n[3] D. Kempe, J. Kleinberg, and \u00c9. Tardos, \u201cMaximizing the spread of in\ufb02uence through a social\n\nnetwork,\u201d in Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 2003.\n\n[4] E. Lindgren, S. Wu, and A. G. Dimakis, \u201cLeveraging sparsity for ef\ufb01cient submodular data\nsummarization,\u201d in Advances in Neural Information Processing Systems, 2016, pp. 3414\u20133422.\n[5] A. Krause and R. G. Gomes, \u201cBudgeted nonparametric learning from data streams,\u201d in ICML,\n\n2010, pp. 391\u2013398.\n\n[6] K. El-Arini and C. Guestrin, \u201cBeyond keyword search: discovering relevant scienti\ufb01c literature,\u201d\nin Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery\nand data mining. ACM, 2011, pp. 439\u2013447.\n\n[7] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, \u201cAn analysis of approximations for maxi-\nmizing submodular set functions\u2014i,\u201d Mathematical Programming, vol. 14, no. 1, pp. 265\u2013294,\n1978.\n\n[8] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause, \u201cStreaming submodular\nmaximization: Massive data summarization on the \ufb02y,\u201d in Proceedings of the 20th ACM\nSIGKDD. ACM, 2014, pp. 671\u2013680.\n\n[9] A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta, \u201cRobust submodular observation\n\nselection,\u201d Journal of Machine Learning Research, vol. 9, no. Dec, pp. 2761\u20132801, 2008.\n\n[10] J. B. Orlin, A. S. Schulz, and R. Udwani, \u201cRobust monotone submodular function maximization,\u201d\n\nin Int. Conf. on Integer Programming and Combinatorial Opt. (IPCO). Springer, 2016.\n\n[11] I. Bogunovic, S. Mitrovi\u00b4c, J. Scarlett, and V. Cevher, \u201cRobust submodular maximization: A\n\nnon-uniform partitioning approach,\u201d in Int. Conf. Mach. Learn. (ICML), 2017.\n\n[12] R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani, \u201cFast greedy algorithms in MapReduce\n\nand streaming,\u201d ACM Transactions on Parallel Computing, vol. 2, no. 3, p. 14, 2015.\n\n[13] A. Norouzi-Fard, A. Bazzi, I. Bogunovic, M. El Halabi, Y.-P. Hsieh, and V. Cevher, \u201cAn ef\ufb01cient\nstreaming algorithm for the submodular cover problem,\u201d in Adv. Neur. Inf. Proc. Sys. (NIPS),\n2016.\n\n[14] W. Chen, T. Lin, Z. Tan, M. Zhao, and X. Zhou, \u201cRobust in\ufb02uence maximization,\u201d in Proceed-\n\nings of the ACM SIGKDD, 2016, p. 795.\n\n[15] M. Staib and S. Jegelka, \u201cRobust budget allocation via continuous submodular functions,\u201d in\n\nInt. Conf. Mach. Learn. (ICML), 2017.\n\n[16] D. Golovin and A. Krause, \u201cAdaptive submodularity: Theory and applications in active learning\n\nand stochastic optimization,\u201d Journal of Arti\ufb01cial Intelligence Research, vol. 42, 2011.\n\n[17] A. Guillory and J. Bilmes, \u201cInteractive submodular set cover,\u201d arXiv preprint arXiv:1002.3345,\n\n2010.\n\n[18] B. Mirzasoleiman, A. Karbasi, and A. Krause, \u201cDeletion-robust submodular maximization:\nData summarization with \u201cthe right to be forgotten\u201d,\u201d in International Conference on Machine\nLearning, 2017, pp. 2449\u20132458.\n\n[19] J. Mcauley and J. Leskovec, \u201cDiscovering social circles in ego networks,\u201d ACM Trans. Knowl.\n\nDiscov. Data, 2014.\n\n[20] J. Yang and J. Leskovec, \u201cDe\ufb01ning and evaluating network communities based on ground-truth,\u201d\n\nKnowledge and Information Systems, vol. 42, no. 1, pp. 181\u2013213, 2015.\n\n[21] F. M. Harper and J. A. Konstan, \u201cThe MovieLens datasets: History and context,\u201d ACM Transac-\n\ntions on Interactive Intelligent Systems (TiiS), vol. 5, no. 4, p. 19, 2016.\n\n[22] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein,\nand R. B. Altman, \u201cMissing value estimation methods for DNA microarrays,\u201d Bioinformatics,\nvol. 17, no. 6, pp. 520\u2013525, 2001.\n\n10\n\n\f", "award": [], "sourceid": 2383, "authors": [{"given_name": "Slobodan", "family_name": "Mitrovic", "institution": "EPFL"}, {"given_name": "Ilija", "family_name": "Bogunovic", "institution": "EPFL Lausanne"}, {"given_name": "Ashkan", "family_name": "Norouzi-Fard", "institution": "EPFL"}, {"given_name": "Jakub", "family_name": "Tarnawski", "institution": "EPFL"}, {"given_name": "Volkan", "family_name": "Cevher", "institution": "EPFL"}]}