{"title": "Do Less, Get More: Streaming Submodular Maximization with Subsampling", "book": "Advances in Neural Information Processing Systems", "page_first": 732, "page_last": 742, "abstract": "In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a $p$-matchoid constraint, our randomized algorithm achieves a $4p$ approximation ratio (in expectation) with $O(k)$ memory and $O(km/p)$ queries per element ($k$ is the size of the largest feasible solution and $m$ is the number of matroids used to define the constraint). For the non-monotone case, our approximation ratio increases only slightly to $4p+2-o(1)$.  To the best or our knowledge, our algorithm is the first that combines the benefits of streaming and subsampling in a novel way in order to truly scale submodular maximization to massive machine learning problems. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty-fold while maintaining practically the same utility. We also evaluated the scalability of our algorithm on a large dataset of Uber pick up locations.", "full_text": "Do Less, Get More: Streaming Submodular\n\nMaximization with Subsampling\n\nMoran Feldman\n\nOpen University of Israel\nmoranfe@openu.ac.il\n\nAmin Karbasi\nYale University\n\namin.karbasi@yale.edu\n\nEhsan Kazemi\nYale University\n\nehsan.kazemi@yale.edu\n\nAbstract\n\nIn this paper, we develop the \ufb01rst one-pass streaming algorithm for submodular\nmaximization that does not evaluate the entire stream even once. By carefully sub-\nsampling each element of the data stream, our algorithm enjoys the tightest approx-\nimation guarantees in various settings while having the smallest memory footprint\nand requiring the lowest number of function evaluations. More speci\ufb01cally, for\na monotone submodular function and a p-matchoid constraint, our randomized\nalgorithm achieves a 4p approximation ratio (in expectation) with O(k) memory\nand O(km/p) queries per element (k is the size of the largest feasible solution and\nm is the number of matroids used to de\ufb01ne the constraint). For the non-monotone\ncase, our approximation ratio increases only slightly to 4p + 2  o(1). To the best\nor our knowledge, our algorithm is the \ufb01rst that combines the bene\ufb01ts of streaming\nand subsampling in a novel way in order to truly scale submodular maximization to\nmassive machine learning problems. To showcase its practicality, we empirically\nevaluated the performance of our algorithm on a video summarization application\nand observed that it outperforms the state-of-the-art algorithm by up to \ufb01fty-fold\nwhile maintaining practically the same utility. We also evaluated the scalability of\nour algorithm on a large dataset of Uber pick up locations.\n\n1\n\nIntroduction\n\nSubmodularity characterizes a wide variety of discrete optimization problems that naturally occur\nin machine learning and arti\ufb01cial intelligence [2]. Of particular interest is submodular maximiza-\ntion, which captures many novel instances of data summarization such as active set selection in\nnon-parametric learning [31], image summarization [40], corpus summarization [28], fMRI parcella-\ntion [37], ensuring privacy and fairness [21], two-stage optimization [34] and removing redundant\nelements from DNA sequencing [27], to name a few.\nOften the collection of elements to be summarized is generated continuously, and it is important\nto maintain at real time a summary of the part of the collection generated so far. For example, a\nsurveillance camera generates a continuous stream of frames, and it is desirable to be able to quickly\nget at every given time point a short summary of the frames taken so far. The na\u00efve way to handle\nsuch a data summarization task is to store the entire set of generated elements, and then, upon request,\nuse an appropriate of\ufb02ine submodular maximization algorithm to generate a summary out of the\nstored set. Unfortunately, this approach is usually not practical both because it requires the system to\nstore the entire generated set of elements and because the generation of the summary from such a\nlarge amount of data can be very slow. These issues have motivated previous works to use streaming\nsubmodular maximization algorithms for data summarization tasks [1, 17, 32].\nThe \ufb01rst works (we are aware of) to consider a one-pass streaming algorithm for submodular max-\nimization problems were the work of Badanidiyuru et al. [1], who described a 1/2-approximation\nstreaming algorithm for maximizing a monotone submodular function subject to a cardinality con-\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fTable 1: Streaming algorithms for submodular maximization subject to a p-matchoid constraint.\n\nAlgorithm\nFunction\nDeterministic Monotone\nRandomized\nNon-monotone\nDeterministic Non-monotone\nDeterministic Non-monotone\nRandomized Monotone\nRandomized\n\nNon-monotone\n\nApprox. Ratio\n\nMemory\n\nQueries per\n\nElement\nO(km)\n\n4p\n\n5p+2+1/p\n9p+O(pp)\n\n1\"\n1\"\n\n4p + 4pp + 1\n\n4p\n\n4p + 2  o(1)\n\nO(k)\n\" ) O( k2m\n\"2 log k\nO( k\nlog k\n\" )\n\"2\nO( km\n\" log k\nO( k\n\" log k\n\" )\n\" )\nO(kpp)\nO(ppkm)\nO(km/p)\nO(k)\nO(k)\nO(km/p)\n\nReference\n[8]\n[8]\n[8]\n[33]2\nThis paper\nThis paper\n\nstraint, and the work of Chakrabarti and Kale [7] who gave a 4p-approximation streaming algorithm\nfor maximizing such functions subject to the intersection of p matroid constraints. The last result\nwas later extended by Chekuri et al. [8] to p-matchoids constraints. For non-monotone submod-\nular objectives, the \ufb01rst streaming result was obtained by Buchbinder et al. [5], who described a\nrandomized streaming algorithm achieving 11.197-approximation for the problem of maximizing a\nnon-monotone submodular function subject to a single cardinality constraint. Then, Chekuri et al. [8]\ndescribed an algorithm of the same kind achieving (5p + 2 + 1/p)/(1  \")-approximation for the\nproblem of maximizing a non-monotone submodular function subject to a p-matchoid constraint, and\na deterministic streaming algorithm achieving (9p + O(pp))/(1  \")-approximation for the same\nproblem.1 Finally, very recently, Mirzasoleiman et al. [33] came up with a different deterministic\nalgorithm for the same problem achieving an approximation ratio of 4p + 4pp + 1.\nIn the \ufb01eld of submodular optimization, it is customary to assume that the algorithm has access to\nthe objective function and constraint through oracles. In particular, all the above algorithms assume\naccess to a value oracle that given a set S returns the value of the objective function for this set, and\nto an independence oracle that given a set S and an input matroid answers whether S is feasible\nor not in that matroid. Given access to these oracles, the algorithms of Chakrabarti and Kale [7]\nand Chekuri et al. [8] for monotone submodular objective functions are quite ef\ufb01cient, requiring only\nO(k) memory (k is the size of the largest feasible set) and using only O(km) value and independence\noracle queries for processing a single element of the stream (m is the number of matroids used to\nde\ufb01ne the p-matchoid constraint). However, the algorithms developed for non-monotone submodular\nobjectives are much less ef\ufb01cient (see Table 1 for their exact parameters).\nIn this paper, we describe a new randomized streaming algorithm for maximizing a submodular\nfunction subject to a p-matchoid constraint. Our algorithm obtains an improved approximation ratio\n\nof 2p + 2pp(p + 1) + 1 = 4p + 2  o(1), while using only O(k) memory and O(km/p) value and\n\nindependence oracle queries (in expectation) per element of the stream, which is even less than the\nnumber of oracle queries used by the state-of-the-art algorithm for monotone submodular objectives.\nMoreover, when the objective function is monotone, our algorithm (with slightly different parameter\nvalues) achieves an improved approximation ratio of 4p using the same memory and oracle query\ncomplexities, i.e., it matches the state-of-the-art algorithm for monotone objectives in terms of the\napproximation ratio, while improving over it in terms of the number of value and independence oracle\nqueries used. Additionally, we would like to point out that our algorithm also works in the online\nmodel with preemption suggested by Buchbinder et al. [5] for submodular maximization problems.\nThus, our result for non-monotone submodular objectives represents the \ufb01rst non-trivial result in this\nmodel for such objectives for any constraint other than a single matroid constraint. Furthermore,\ndespite the generality of our algorithm for a p-matchoid constraint (which includes, in particular,\na cardinality constraint, a single matroid constraint and an intersection of multiple matroids), the\n\n1The algorithms of [8] use an of\ufb02ine algorithm for the same problem in a black box fashion, and their\napproximation ratios depend on the of\ufb02ine algorithm used. The approximation ratios stated here assume the\nstate-of-the-art of\ufb02ine algorithms of [15] which were published only recently, and thus, they are better than the\napproximation ratios stated by [8].\n\n2The memory and query complexities of the algorithm of Mirzasoleiman et al. [33] have been calculated\nbased on the corresponding complexities of the algorithm of [8] for monotone objectives and the properties of\nthe reduction used by [33]. We note that these complexities do not match the memory and query complexities\nstated by [33] for their algorithm.\n\n2\n\n\fapproximation ratio that it achieves is the state-of-the-art for all the above special cases. For example,\n\nfor a single matroid constraint, our algorithm achieves an approximation ratio of 3 + 2p2 \u21e1 5.828,\n\nwhich improves over the previous state-of-the-art 8-approximation algorithm by Chekuri et al. [8].\nIn addition to mathematically analyzing our algorithm, we also studied its practical performance and\nscalability in video summarization and location summarization tasks. We observed that, while our\nalgorithm preserves the quality of the produced summaries, it outperforms the running time of the\nstate-of-the-art algorithm by an order of magnitude. We also studied the effect of imposing different\np-matchoid constraints on these applications. Most of the proofs for the theoretical results are\ndeferred to the Supplementary Material.\n\n1.1 Additional Related Work\n\nThe work on (of\ufb02ine) maximizing a monotone submodular function subject to a matroid constraint\ngoes back to the classical result of Fisher et al. [16], who showed that the natural greedy algorithm\ngives an approximation ratio of 2 for this problem. Later, an algorithm with an improved approxima-\ntion ratio of e/(e 1) was found for this problem [6], which is the best that can be done in polynomial\ntime [35]. In contrast, the corresponding optimization problem for non-monotone submodular ob-\njectives is much less well understood. After a long series of works [11, 13, 25, 36, 43], the current\nbest approximation ratio for this problem is 2.598 [3], which is still far from the state-of-the-art\ninapproximability result of 2.093 for this problem due to [36].\nSeveral works have considered (of\ufb02ine) maximization of both monotone and non-monotone submod-\nular functions subject to constraint families generalizing matroid constraints, including intersection\nof p-matroid constraints [26], p-exchange system constraints [14, 45], p-extendible system con-\nstraints [15] and p-systems constraints [15, 16, 19, 30]. We note that the \ufb01rst of these families\nis a subset of the p-matchoid constraints studied by the current work, while the last two families\ngeneralize p-matchoid constraints. Moreover, the state-of-the-art approximation ratios for all these\nfamilies of constraints are p \u00b1 O(pp) both for monotone and non-monotone submodular objectives.\nThe study of submodular maximization in the streaming setting has been mostly surveyed above.\nHowever, we would like to note that besides the above-mentioned results, there are also a few works\non submodular maximization in the sliding window variant of the streaming setting [9, 12, 44].\n\n1.2 Our Technique\n\nTechnically, our algorithm is equivalent to dismissing every element of the stream with an appropriate\nprobability and then feeding the elements that have not been dismissed into the deterministic algorithm\nof [8] for maximizing a monotone submodular function subject to a p-matchoid constraint. The\nrandom dismissal of elements gives the algorithm two advantages. First, it makes it faster because\nthere is no need to process the dismissed elements. Second, it is well known that such a dismissal\noften transforms an algorithm for monotone submodular objectives into an algorithm with some\napproximation guarantee also for non-monotone objectives. However, besides the above important\nadvantages, dismissing elements at random also have an obvious drawback, namely, the dismissed\nelements are likely to include a signi\ufb01cant fraction of the value of the optimal solution. The crux\nof the analysis of our algorithm is its ability to show that the above-mentioned loss of value due\nto the random dismissal of elements does not affect the approximation ratio. To do so, we prove a\nstronger version of a structural lemma regarding graphs and matroids (see Proposition 10) that was\nimplicitly proved by [42] and later stated explicitly by [8]. This proposition provides a mapping\nfrom the elements of the optimal solution to elements of the solution S chosen by our algorithm.\nThis mapping helps us to show that the value of the elements of the optimal solution that do not\nbelong to set S is not too large compared to the value of S itself. In this way, the stronger lemma\nwe prove translates into an improvement in the bound on the performance of the algorithm, which\nis not suf\ufb01cient to improve the guaranteed approximation ratio, but fortunately, is good enough to\ncounterbalance the loss due to the random dismissal of elements.\nWe would like to note that the general technique of dismissing elements at random and then running\nan algorithm for monotone submodular objectives on the remaining elements, was previously used\nby [15] in the context of of\ufb02ine algorithms. However, the method we use in this work to counterbal-\nance the loss of value due to the random dismissal of streaming elements is completely unrelated to\nthe way this was achieved in [15].\n\n3\n\n\f2 Preliminaries\n\nIn this section, we introduce some notation and de\ufb01nitions that we later use to formally state our\nresults. A set function f : 2N ! R on a ground set N is non-negative if f (S)  0 for every\nS \u2713N , monotone if f (S) \uf8ff f (T ) for every S \u2713 T \u2713N and submodular if f (S) + f (T ) \nf (S [ T ) + f (S \\ T ) for every S, T \u2713N . Intuitively, a submodular function is a function that\nobeys the property of diminishing returns, i.e., the marginal contribution of adding an element to a\nset diminishes as the set becomes larger and larger. Unfortunately, it is somewhat dif\ufb01cult to relate\nthis intuition to the above (quite cryptic) de\ufb01nition of submodularity, and therefore, a more friendly\nequivalent de\ufb01nition of submodularity is often used. However, to present this equivalent de\ufb01nition\nin a simple form, we need some notation. Given a set S and an element u, we denote by S + u\nand S  u the union S [{ u} and the expression S \\ {u}, respectively. Additionally, the marginal\ncontribution of u to the set S under the set function f is written as f (u | S) , f (S + u)  f (S).\nUsing this notation, we can now state the above mentioned equivalent de\ufb01nition of submodularity,\nwhich is that a set function f is submodular if and only if\n\nf (u | S)  f (u | T ) 8 S \u2713 T \u2713N and u 2N \\ T .\n\nOccasionally, we also refer to the marginal contribution of a set T to a set S (under a set function f),\nwhich we write as f (T | S) , f (S [ T )  f (S).\nA set system is a pair (N ,I), where N is the ground set of the set system and I\u2713 2N is the set of\nindependent sets of the set system. A matroid is a set system which obeys three properties: (i) the\nempty set is independent, (ii) if S \u2713 T \u2713N and T is independent, then so is S, and \ufb01nally, (iii) if S\nand T are two independent sets obeying |S| < |T|, then there exists an element u 2 T \\ S such that\nS + u is independent. In the following lines we de\ufb01ne two matroid related terms that we use often\nin our proofs, however, readers who are not familiar with matroid theory should consider reading\na more extensive presentation of matroids, such as the one given by [38, Volume B]. A cycle of a\nmatroid is an inclusion-wise minimal dependent set, and an element u is spanned by a set S if the\nmaximum size independent subsets of S and S + u are of the same size. Note that it follows from\nthese de\ufb01nitions that every element u of a cycle C is spanned by C  u.\nA set system (N ,I) is a p-matchoid, for some positive integer p, if there exist m matroids\n(N1,I1), (N2,I2), . . . , (Nm,Im) such that every element of N appears in the ground set of at\nmost p out of these matroids and I = {S \u2713 2N |8 1\uf8ffi\uf8ffm S \\N i 2I i}. A simple example for a\n2-matchoid is b-matching. Recall that a set E of edges of a graph is a b-matching if and only if every\nvertex v of the graph is hit by at most b(v) edges of E, where b is a function assigning integer values\nto the vertices. The corresponding 2-matchoid M has the set of edges of the graph as its ground\nset and a matroid for every vertex of the graph, where the matroid Mv of a vertex v of the graph\nhas in its ground set only the edges hitting v and a set E of edges is independent in Mv if and only\nif |E|\uf8ff b(v). Since every edge hits only two vertices, it appears in the ground sets of only two\nvertex matroids, and thus, M is indeed a 2-matchoid. Moreover, one can verify that a set of edges is\nindependent in M if and only if it is a valid b-matching.\nThe problem of maximizing a set function f : 2N ! R subject to a p-matchoid constraint M =\n(N ,I) asks us to \ufb01nd an independent set S 2I maximizing f (S). In the streaming setting, we\nassume that the elements of N arrive sequentially in some adversarially chosen order, and the\nalgorithm learns about each element only when it arrives. The objective of an algorithm in this setting\nis to maintain a set S 2I which approximately maximizes f, and to do so with as little memory as\npossible. In particular, we are interested in algorithms whose memory requirement does not depend\non the size of the ground set N , which means that they cannot keep in their memory all the elements\nthat have arrived so far. Our two main results for this setting are given by the following theorems.\nRecall that k is the size of the largest independent set and m is the number of matroids used to de\ufb01ne\nthe p-matchoid constraint.\n\nTheorem 1. There is a streaming (2p + 2pp(p + 1) + 1)-approximation algorithm for the problem\n\nof maximizing a non-negative submodular function f subject to a p-matchoid constraint whose space\ncomplexity is O(k). Moreover, in expectation, this algorithm uses O(km/p) value and independence\noracle queries when processing each arriving element.\nTheorem 2. There is a streaming 4p-approximation algorithm for the problem of maximizing a\nnon-negative monotone submodular function f subject to a p-matchoid constraint whose space\n\n4\n\n\fcomplexity is O(k). Moreover, in expectation, this algorithm uses O(km/p) value and independence\noracle queries when processing each arriving element.\n\n3 Algorithm\n\nIn this section we prove Theorems 1 and 2. Throughout this section we assume that f is a non-\nnegative submodular function over the ground set N , and M = (N ,I) is a p-matchoid over the same\nground set which is de\ufb01ned by the matroids (N1,I1), (N2,I2), . . . , (Nm,Im). Additionally, we\ndenote by u1, u2, . . . , un the elements of N in the order in which they arrive. Finally, for an element\nui 2 N and sets S, T \u2713N , we use the shorthands f (ui : S) = f (ui | S \\{ u1, u2, . . . , ui1}) and\nf (T : S) =Pu2T f (u : S). Intuitively, f (u : S) is the marginal contribution of u to the part of S\nthat arrived before u itself.\nLet us now present the algorithm we use to prove our results. This algorithm uses a procedure named\nEXCHANGE-CANDIDATE which appeared also in previous works, sometimes under the exact same\nname. EXCHANGE-CANDIDATE gets an independent set S and an element u, and its role is to output\na set U \u2713 S such that S \\ U + u is independent. The pseudocode of EXCHANGE-CANDIDATE is\ngiven as Algorithm 1.\n\nAlgorithm 1: EXCHANGE-CANDIDATE\n(S, u)\n1 Let U ?.\n2 for ` = 1 to m do\n3\n4\n\nif (S + u) \\N ` 62 I` then\n\nAlgorithm 2: SAMPLE-STREAMING\n1 Let S0 ?.\n2 for every arriving element ui do\n3\n4\n5\n\nLet Si Si1.\nwith probability q do\n\nLet X` { x 2 S |\n((S  x + u) \\N `) 2I `}.\nLet x` arg minx2X` f (x : S).\nUpdate U U + x`.\n\n6\n\n5\n6\n7 return U.\n\nLet Ui EXCHANGE-CANDIDATE(Si1, ui).\nif f (ui | Si1)  (1 + c) \u00b7 f (Ui : Si1)\nthen Let Si Si1 \\ Ui + ui.\n\n7 return Sn.\n\nOur algorithm, which uses the procedure EXCHANGE-CANDIDATE, is given as Algorithm 2. This\nalgorithm has two parameters, a probability q and a value c > 0. Whenever the algorithm gets a new\nelement u, it dismisses it with probability 1  q. Otherwise, it \ufb01nds using EXCHANGE-CANDIDATE\na set U of elements whose removal from the current solution maintained by the algorithm allows the\naddition of u to this solution. If the marginal contribution of adding u to the solution is large enough\ncompared to the value of the elements of U, then u is added to the solution and the elements of U\nare removed. While reading the pseudocode of the algorithm, keep in mind that Si represents the\nsolution of the algorithm after i elements have been processed.\nObservation 3. Algorithm 2 can be implemented using O(k) memory and, in expectation, O(qkm)\nvalue and independence oracle queries per arriving element.\n\nThe following technical theorem is the main tool that we use to analyze the approximation ratio\nof Algorithm 2; its proof can be found in Appendix A. Let OP T be an independent set of M\nmaximizing f, and let A be the set of elements that ever appeared in the solution maintained by\nAlgorithm 2\u2014formally, A =Sn\nTheorem 4. Assuming q1 = (1 + c)p + 1, E[f (Sn)] \nProving our result for monotone functions (Theorem 2) is now straightforward.\n\n(1+c)2p \u00b7 E[f (A [ OP T )].\n\ni=1 Si.\n\nc\n\nProof of Theorem 2. By plugging c = 1 and q1 = 2p + 1 into Algorithm 2, we get an algorithm\nwhich uses O(k) memory and O(km/p) oracle queries by Observation 3. Additionally, by Theorem 4,\nthis algorithm obeys\n\nE[f (Sn)] \n\nc\n\n(1 + c)2p \u00b7 E[f (A [ OP T )] =\n\n1\n4p \u00b7 E[f (A [ OP T )] \n\n1\n4p \u00b7 f (OP T ) ,\n\nwhere the second inequality follows from the monotonicity of f. Thus, the approximation ratio of the\nalgorithm we got is at most 4p.\n\n5\n\n\fProving our result for non-monotone functions is a bit more involved and requires the following\nknown lemma.\nLemma 5 (Lemma 2.2 of [4]). Let g : 2N ! R0 be a non-negative submodular function, and let B\nbe a random subset of N containing every element of N with probability at most q (not necessarily\nindependently), then E[g(B)]  (1  q) \u00b7 g(?).\nProof of Theorem 1. By plugging c =p1 + 1/p and q1 = p +pp(p + 1) + 1 into Algorithm 2,\n\nwe get an algorithm which uses O(k) memory and O(km/p) oracle queries by Observation 3.\nAdditionally, by Theorem 4, this algorithm obeys\n\nE[f (Sn)] \n\nc\n\n(1 + c)2p \u00b7 E[f (A [ OP T )] .\n\nLet us now de\ufb01ne g : 2N ! R0 to be the function g(S) = f (S[OP T ). Note that g is non-negative\nand submodular. Thus, by Lemma 5 and the fact that A contains every element with probability at\nmost q (because Algorithm 2 accepts an element into its solution with at most this probability), we\nget\n\nE[f (A [ OP T )] = E[g(A)]  (1  q) \u00b7 g(?) =\n\np +pp(p + 1)\np +pp(p + 1) + 1 \u00b7 f (OP T )\np1 + 1/p \u00b7 (p +pp(p + 1)) \u00b7 f (OP T ) =\n\np +pp(p + 1)\n\n1\nc \u00b7 f (OP T ) .\n\n=\n\nCombining the two above inequalities, we get\n\nf (OP T )\n(1 + c)2p\n\n=\n\nE[f (Sn)] \n\n2p + 2pp(p + 1) + 1\nThus, the approximation ratio of the algorithm we got is at most 2p + 2pp(p + 1) + 1.\n\n(2 + 2p1 + 1/p + 1/p)p\n\n=\n\nf (OP T )\n\nf (OP T )\n\n4 Experiment\n\nIn this section, we investigate the performance of our algorithm on two data summarization applica-\ntions. In the \ufb01rst part, we replicate the exact setting of Mirzasoleiman et al. [33] and compare the\nperformance our algorithm in this setting with the performance of the algorithm of Mirzasoleiman\net al. [33]. Unfortunately, to allow such a comparison we had to resort to the relatively small datasets\nthat existing algorithms can handle. Interestingly, however, despite the small size of these datasets,\nwe could still observe the superiority of our method against the state-of-the-art. In the second part,\nwe investigate the scalability of our algorithm to larger datasets.\n\n4.1 Video Summarization\n\nIn this section, we evaluate the performance of our algorithm (SAMPLE-STREAMING) on a video\nsummarization task and compare it with SEQDPP [18]3 and LOCAL-SEARCH [33].4 For our experi-\nments, we use the Open Video Project (OVP) and the YouTube datasets, which have 50 and 39 videos,\nrespectively [10].\nDeterminantal point process (DPP) is a powerful method to capture diversity in datasets [24, 29].\nLet N = {1, 2,\u00b7\u00b7\u00b7 , n} be a ground set of n items. A DPP de\ufb01nes a probability distribution\nover all subsets of N , and a random variable Y distributed according to this distribution obeys\nPr[Y = S] = det(LS )\ndet(I+L) for every set S \u2713N , where L is a positive semide\ufb01nite kernel matrix, LS\nis the principal sub-matrix of L indexed by S and I is the n \u21e5 n identity matrix. The most diverse\nsubset of N is the one with the maximum probability in this distribution. Unfortunately, \ufb01nding this\nset is NP-hard [22], but the function f (S) = log det(LS) is a non-monotone submodular function\n[24].\n\n3https://github.com/pujols/Video-summarization\n4https://github.com/baharanm/non-mon-stream\n\n6\n\n\fe\nu\nl\na\nv\n\ne\nv\ni\nt\nc\ne\nj\nb\no\n\nd\ne\nz\ni\nl\na\nm\nr\no\nN\n\n1.05\n\n1.00\n\n0.95\n\n0.90\n\nSample-Streaming\nLocal-Search\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\nSegment Size\n\n(cid:85)(cid:28)(cid:86) (cid:117)(cid:81)(cid:109)(cid:104)(cid:109)(cid:35)(cid:50) (cid:112)(cid:66)(cid:47)(cid:50)(cid:81)(cid:98)\n\ne\nu\nl\na\nv\n\ne\nv\ni\nt\nc\ne\nj\nb\no\n\nd\ne\nz\ni\nl\na\nm\nr\no\nN\n\n1.02\n\n1.00\n\n0.98\n\n0.96\n\n0.94\n\n0.92\n\n0.90\n\nSample-Streaming\nLocal-Search\n\ns\nl\nl\na\nC\ne\nl\nc\na\nr\nO\n\nf\no\n\nr\ne\nb\nm\nu\nN\n\n105\n\n104\n\nSampling-OVP\nSampling-YouTube\nLocal-OVP\nLocal-YouTube\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\nSegment Size\n(cid:85)(cid:35)(cid:86) (cid:80)(cid:111)(cid:83) (cid:112)(cid:66)(cid:47)(cid:50)(cid:81)(cid:98)\n\n10\n\n12\n\n14\n\n16\n\n18\n\n20\n\nSegment Size\n\n(cid:85)(cid:43)(cid:86) (cid:76)(cid:109)(cid:75)(cid:35)(cid:50)(cid:96) (cid:81)(cid:55) (cid:81)(cid:96)(cid:28)(cid:43)(cid:72)(cid:50) (cid:43)(cid:28)(cid:72)(cid:72)(cid:98)\n\nFigure 1: Comparing the normalized objective value and running time of SAMPLE-STREAMING and\nLOCAL-SEARCH for different segment sizes.\n\nFigure 2: Summary generated by SAMPLE-STREAMING for OVP video number 60.\n\ndet(It+L)\n\ndet(LSt[St1 )\n\nWe follow the experimental setup of [18] for extracting frames from videos, \ufb01nding a linear kernel\nmatrix L and evaluating the quality of produced summaries based on their F-score. Gong et al. [18]\nde\ufb01ne a sequential DPP, where each video sequence is partitioned into disjoint segments of equal\nsizes. For selecting a subset St from each segment t (i.e., set Pt), a DPP is de\ufb01ned on the union\nof the frames in this segment and the selected frames St1 from the previous segment. Therefore,\nthe conditional distribution of St is given by, Pr[St|St1] =\n, where L is the kernel\nmatrix de\ufb01ned over Pt [ St1, and It is a diagonal matrix of the same size as Pt [ St1 in which the\nelements corresponding to St1 are zeros and the elements corresponding to Pt are 1. For the detailed\nexplanation, please refer to [18]. In our experiments, we focus on maximizing the non-monotone\nsubmodular function f (St) = log det(LSt[St1). We would like to point out that this function can\ntake negative values, which is slightly different from the non-negativity condition we need for our\ntheoretical guarantees.\nWe \ufb01rst compare the objective values (F-scores) of SAMPLE-STREAMING and LOCAL-SEARCH for\ndifferent segment sizes over YouTube and OVP datasets. In each experiment, the values are normalized\nto the F-score of summaries generated by SEQDPP. While SEQDPP has the best performance in\nterms of maximizing the objective value, in Figures 1(a) and 1(b), we observe that both SAMPLE-\nSTREAMING and LOCAL-SEARCH produce summaries with very high qualities. Figure 2 shows the\nsummary produced by our algorithm for OVP video number 60. Mirzasoleiman et al. [33] showed\nthat their algorithm (LOCAL-SEARCH) runs three orders of magnitude faster than SEQDPP [18].\nIn our experiments (see Figure 1(c)), we observed that SAMPLE-STREAMING is 40 and 50 times\nfaster than LOCAL-SEARCH for the YouTube and OVP datasets, respectively. Note that for different\nsegment sizes the number of frames remains constant; therefore, the time complexities for both\nSAMPLE-STREAMING and LOCAL-SEARCH do not change.\nIn a second experiment, we study the effect of imposing different constraints on video summarization\ntask for YouTube video number 106, which is a part of the America\u2019s Got Talent series. In the \ufb01rst set\nof constraints, we consider 6 (for 6 different faces in the frames) partition matroids to limit the number\nof frames containing each face i, i.e., a 6-matchoid constraint5 I = {S \u2713N : |S \\N i|\uf8ff ki},\nwhere Ni \u2713N is the set of frames containing face i for 1 \uf8ff i \uf8ff 6. For all the i values, we set\nki = 3. In this experiment, we use the same methods as described by Mirzasoleiman et al. [33] for\nface recognition. Figure 3(a) shows the summary produced for this task. The second set of constraints\nis a 3-matchoid, where matroids limit the number of frames containing each one of the three judges.\nThe summary for this constraint is shown in Figure 3(b). Finally, Figure 3(c) shows a summary with\na single partition matroid constraint on the singer.\n\n5Note that a frame may contain more than one face.\n\n7\n\n\fFigure 3: Summaries generated by SAMPLE-STREAMING for YouTube video number 106: (a) a\n6-matchoid constraint, (b) a 3-matchoid constraint and (c) a partition matroid constraint.\n\n4.2 Location Summarization\n\nIn this section, given a dataset of 504,247 Uber pick ups in Manhattan, New York in April 2014 [41],\nour goal is to \ufb01nd a set of the most representative locations. This dataset allows us to study the effect\nof p and k (the size of the largest feasible solution) on the performance of our algorithm.\nTo do so, the entire area of the given pick ups is covered by m = 166 overlapping circular regions\nof radius r (the centers of these regions provided a 1km-cover of all the area, i.e., for each location\nin the dataset there was at least one center within a distance of 1km from it), and the algorithm was\nallowed to choose at most ` locations out of each one of these regions. One can observe that by\nusing a single matroid for limiting the number of locations chosen within each one of the regions, the\nabove constraint can be expressed as a p-matchoid constraint, where p is the maximum number of\nregions a single location can belong to (notice that p could be much smaller than the total number m\nof regions).\nIn order to \ufb01nd a representative set S, we use the following monotone submodular objective function:\nf (S) = log det(I + \u21b5KS,S), where the matrix K encodes the similarities between data points, KS,S\nis the principal sub-matrix of K indexed by S and \u21b5> 0 is a regularization parameter [20, 23, 39].\nThe similarity of two location samples i and j is de\ufb01ned by a Gaussian kernel Ki,j = exp (d2\ni,j/h2),\nwhere the distance di,j (in meters) is calculated from the coordinates and h is set to 5000.\nIn the \ufb01rst experiment, we set the radius of regions to r = 1.5km. In this setting, we observed\nthat a point belongs to at most 7 regions; hence, the constraint is a 7-matchoid. For ` = 5, it\ntook 116 seconds6 (and 693,717 oracle calls) for our algorithm to \ufb01nd a summary of size k = 153.\nAdditionally, for ` = 10 and ` = 20 it took 294 seconds (and 1,306,957 oracle calls) and 1004\nseconds (and 2,367,389 oracle calls), respectively, for the algorithm to produce summaries of sizes\n301 and 541, respectively.\nIn the second experiment, we set the radius of regions to r = 2.5km to investigate the performance of\nour algorithm on p-matchoids with larger values of p. In this setting, we observed that a point belongs\nto at most 17 regions, which made the constraint a 17-matchoid. This time, for ` = 5, it took only 35\nseconds (and 296,023 oracle calls) for our algorithm to \ufb01nd a summary of size k = 54. Additionally,\nfor ` = 10 and ` = 20 it took 80 seconds (and 526,839 oracle calls) and 176 seconds (and 958,549\noracle calls), respectively, for the algorithm to produce summaries of sizes 106 and 198, respectively.\nAs one can observe, our algorithm scales very well to larger datasets. Also, for p-matchoids with\nlarger p (which results in a smaller sampling probability q) the performance gets even better.\n\n5 Conclusion\n\nWe developed a streaming algorithm for submodular maximization by carefully subsampling elements\nof the data stream. Our algorithm provides the best of three worlds: (i) the tightest approximation\nguarantees in various settings, including p-matchoid and matroid constraints for non-monotone\n\n6In these experiments, we used a machine powered by Intel i5, 3.2 GHz processor and 16 GB of RAM.\n\n8\n\n\fsubmodular functions, (ii) minimum memory requirement, and (iii) fewest queries per element. We\nalso experimentally studied the effectiveness of our algorithm.\n\nAcknowledgements. The work of Amin Karbasi was supported by AFOSR Young Investigator\nAward (FA9550-18-1-0160).\n\nReferences\n[1] Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause.\nStreaming submodular maximization: massive data summarization on the \ufb02y. In KDD, pages\n671\u2013680, 2014.\n\n[2] Jeffrey A. Bilmes and Wenruo Bai. Deep Submodular Functions. CoRR, abs/1701.08939, 2017.\n\nURL http://arxiv.org/abs/1701.08939.\n\n[3] Niv Buchbinder and Moran Feldman. Constrained submodular maximization via a non-\nsymmetric technique. CoRR, abs/1611.03253, 2016. URL http://arxiv.org/abs/1611.\n03253.\n\n[4] Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization\n\nwith cardinality constraints. In SODA, pages 1433\u20131452, 2014.\n\n[5] Niv Buchbinder, Moran Feldman, and Roy Schwartz. Online submodular maximization with\n\npreemption. In SODA, pages 1202\u20131216, 2015.\n\n[6] Gruia C\u02d8alinescu, Chandra Chekuri, Martin P\u00e1l, and Jan Vondr\u00e1k. Maximizing a monotone\nsubmodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740\u20131766, 2011.\n[7] Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: matchings,\n\nmatroids, and more. Math. Program., 154(1-2):225\u2013247, 2015.\n\n[8] Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular\n\nfunction maximization. In ICALP, pages 318\u2013330, 2015.\n\n[9] Jiecao Chen, Huy L. Nguyen, and Qin Zhang. Submodular maximization over sliding windows.\n\nCoRR, abs/1611.00129, 2016. URL http://arxiv.org/abs/1611.00129.\n\n[10] Sandra Eliza Fontes De Avila, Ana Paula Brand\u00e3o Lopes, Antonio da Luz Jr, and Arnaldo\nde Albuquerque Ara\u00fajo. VSUMM: A mechanism designed to produce static video summaries\nand a novel evaluation method. Pattern Recognition Letters, 32(1):56\u201368, 2011.\n\n[11] Alina Ene and Huy L. Nguyen. Constrained submodular maximization: Beyond 1/e. In FOCS,\n\npages 248\u2013257, 2016.\n\n[12] Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. Sub-\n\nmodular optimization over sliding windows. In WWW, pages 421\u2013430, 2017.\n\n[13] Moran Feldman, Joseph Naor, and Roy Schwartz. A uni\ufb01ed continuous greedy algorithm for\n\nsubmodular maximization. In FOCS, pages 570\u2013579, 2011.\n\n[14] Moran Feldman, Joseph Naor, Roy Schwartz, and Justin Ward. Improved approximations for\n\nk-exchange systems - (extended abstract). In ESA, pages 784\u2013798, 2011.\n\n[15] Moran Feldman, Christopher Harshaw, and Amin Karbasi. Greed is good: Near-optimal\n\nsubmodular maximization via greedy optimization. In COLT, pages 758\u2013784, 2017.\n\n[16] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations for maximizing\n\nsubmodular set functions \u2013 II. Mathematical Programming Study, 8:73\u201387, 1978.\n\n[17] Ryan Gomes and Andreas Krause. Budgeted nonparametric learning from data streams. In\n\nICML, pages 391\u2013398, 2010.\n\n[18] Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. Diverse sequential subset\n\nselection for supervised video summarization. In NIPS, pages 2069\u20132077, 2014.\n\n9\n\n\f[19] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone\nsubmodular maximization: Of\ufb02ine and secretary algorithms. In WINE, pages 246\u2013257, 2010.\n[20] Ralf Herbrich, Neil D Lawrence, and Matthias Seeger. Fast sparse gaussian process methods:\nThe informative vector machine. In Advances in neural information processing systems, pages\n625\u2013632, 2003.\n\n[21] Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. Scalable deletion-robust submod-\nular maximization: Data summarization with privacy and fairness constraints. In ICML, pages\n2549\u20132558, 2018.\n\n[22] Chun-Wa Ko, Jon Lee, and Maurice Queyranne. An exact algorithm for maximum entropy\n\nsampling. Operations Research, 43(4):684\u2013691, 1995.\n\n[23] Andreas Krause and Carlos Guestrin. Near-optimal nonmyopic value of information in graphical\nmodels. In UAI \u201905, Proceedings of the 21st Conference in Uncertainty in Arti\ufb01cial Intelligence,\nEdinburgh, Scotland, July 26-29, 2005, pages 324\u2013331, 2005.\n\n[24] Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning. Foundations\n\nand Trends in Machine Learning, 5(2\u20133), 2012.\n\n[25] Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Maximizing\nnonmonotone submodular functions under matroid or knapsack constraints. SIAM J. Discrete\nMath., 23(4):2053\u20132078, 2010.\n\n[26] Jon Lee, Maxim Sviridenko, and Jan Vondr\u00e1k. Submodular maximization over multiple matroids\n\nvia generalized exchange properties. Math. Oper. Res., 35(4):795\u2013806, 2010.\n\n[27] Maxwell W. Libbrecht, Jeffrey A. Bilmes, and William Stafford Noble. Choosing non-redundant\nrepresentative subsets of protein sequence data sets using submodular optimization. Proteins:\nStructure, Function, and Bioinformatics, 2018. ISSN 1097-0134.\n\n[28] Hui Lin and Jeff A. Bilmes. A class of submodular functions for document summarization. In\n\nHLT, pages 510\u2013520, 2011.\n\n[29] Odile Macchi. The coincidence approach to stochastic point processes. Advances in Applied\n\nProbability, 7(1):83\u2013122, 1975.\n\n[30] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained\nsubmodular maximization: Personalized data summarization. In ICML, pages 1358\u20131367,\n2016.\n\n[31] Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submodu-\n\nlar maximization. Journal of Machine Learning Research, 17:238:1\u2013238:44, 2016.\n\n[32] Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Deletion-robust submodular\nmaximization: Data summarization with \u201cthe right to be forgotten\u201d. In ICML, pages 2449\u20132458,\n2017.\n\n[33] Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. Streaming Non-Monotone\nSubmodular Maximization: Personalized Video Summarization on the Fly. In AAAI Conference\non Arti\ufb01cial Intelligence, 2018.\n\n[34] Marko Mitrovic, Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. Data Summa-\n\nrization at Scale: A Two-Stage Submodular Approach. In ICML, pages 3593\u20133602, 2018.\n\n[35] G. L. Nemhauser and L. A. Wolsey. Best algorithms for approximating the maximum of a\n\nsubmodular set function. Mathematics of Operations Research, 3(3):177\u2013188, 1978.\n\n[36] Shayan Oveis Gharan and Jan Vondr\u00e1k. Submodular maximization by simulated annealing. In\n\nSODA, pages 1098\u20131116, 2011.\n\n[37] Mehraveh Salehi, Amin Karbasi, Dustin Scheinost, and R. Todd Constable. A submodular\napproach to create individualized parcellations of the human brain. In MICCAI, pages 478\u2013485,\n2017.\n\n10\n\n\f[38] A. Schrijver. Combinatorial Optimization: Polyhedra and Ef\ufb01ciency. Springer, 2003.\n[39] Matthias Seeger. Greedy forward selection in the informative vector machine. Technical report,\n\nTechnical report, University of California at Berkeley, 2004.\n\n[40] Sebastian Tschiatschek, Rishabh K. Iyer, Haochen Wei, and Jeff A. Bilmes. Learning mixtures\nof submodular functions for image collection summarization. In NIPS, pages 1413\u20131421, 2014.\n[41] UberDataset. Uber pickups in new york city, 2014. URL https://www.kaggle.com/\n\nfivethirtyeight/uber-pickups-in-new-york-city.\n\n[42] Ashwinkumar Badanidiyuru Varadaraja. Buyback problem - approximate matroid intersection\n\nwith cancellation costs. In ICALP, pages 379\u2013390, 2011.\n\n[43] Jan Vondr\u00e1k. Symmetry and approximability of submodular maximization problems. SIAM J.\n\nComput., 42(1):265\u2013304, 2013.\n\n[44] Yanhao Wang, Yuchen Li, and Kian-Lee Tan. Ef\ufb01cient streaming algorithms for submodular\nmaximization with multi-knapsack constraints. CoRR, abs/1706.04764, 2017. URL http:\n//arxiv.org/abs/1706.04764.\n\n[45] Justin Ward. A (k+3)/2-approximation algorithm for monotone submodular k-set packing and\n\ngeneral k-exchange systems. In STACS, pages 42\u201353, 2012.\n\n11\n\n\f", "award": [], "sourceid": 418, "authors": [{"given_name": "Moran", "family_name": "Feldman", "institution": "Open University of Israel"}, {"given_name": "Amin", "family_name": "Karbasi", "institution": "Yale"}, {"given_name": "Ehsan", "family_name": "Kazemi", "institution": "Yale Institute for Network Science, Yale"}]}