{"title": "Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues", "book": "Advances in Neural Information Processing Systems", "page_first": 1656, "page_last": 1665, "abstract": "In this work we derive a variant of the classic Glivenko-Cantelli Theorem, which asserts uniform convergence of the empirical Cumulative Distribution Function (CDF) to the CDF of the underlying distribution. Our variant allows for tighter convergence bounds for extreme values of the CDF. We apply our bound in the context of revenue learning, which is a well-studied problem in economics and algorithmic game theory. We derive sample-complexity bounds on the uniform convergence rate of the empirical revenues to the true revenues, assuming a bound on the k'th moment of the valuations, for any (possibly fractional) k > 1. For uniform convergence in the limit, we give a complete characterization and a zero-one law: if the first moment of the valuations is finite, then uniform convergence almost surely occurs; conversely, if the first moment is infinite, then uniform convergence almost never occurs.", "full_text": "Submultiplicative Glivenko-Cantelli and\n\nUniform Convergence of Revenues\n\nNoga Alon\n\nTel Aviv University, Israel\nand Microsoft Research\n\nnogaa@tau.ac.il\n\nMoshe Babaioff\nMicrosoft Research\n\nmoshe@microsoft.com\n\nYannai A. Gonczarowski\n\nThe Hebrew University of Jerusalem, Israel\n\nand Microsoft Research\nyannai@gonch.name\n\nYishay Mansour\n\nTel Aviv University, Israel\nand Google Research, Israel\n\nmansour@tau.ac.il\n\nShay Moran\n\nInstitute for Advanced Study, Princeton\n\nAmir Yehudayoff\n\nTechnion \u2014 IIT, Israel\n\nshaymoran1@gmail.com\n\namir.yehudayoff@gmail.com\n\nAbstract\n\nIn this work we derive a variant of the classic Glivenko-Cantelli Theorem, which\nasserts uniform convergence of the empirical Cumulative Distribution Function\n(CDF) to the CDF of the underlying distribution. Our variant allows for tighter\nconvergence bounds for extreme values of the CDF.\nWe apply our bound in the context of revenue learning, which is a well-studied\nproblem in economics and algorithmic game theory. We derive sample-complexity\nbounds on the uniform convergence rate of the empirical revenues to the true\nrevenues, assuming a bound on the kth moment of the valuations, for any (possibly\nfractional) k > 1.\nFor uniform convergence in the limit, we give a complete characterization and a\nzero-one law: if the \ufb01rst moment of the valuations is \ufb01nite, then uniform conver-\ngence almost surely occurs; conversely, if the \ufb01rst moment is in\ufb01nite, then uniform\nconvergence almost never occurs.\n\n1\n\nIntroduction\n\nA basic task in machine learning is to learn an unknown distribution \u00b5, given access to samples\nfrom it. A natural and widely studied criterion for learning a distribution is approximating its\nCumulative Distribution Function (CDF). The seminal Glivenko-Cantelli Theorem [13, 6] addresses\nthis question when the distribution \u00b5 is over the real numbers. It determines the behavior of the\nempirical distribution function as the number of samples grows: let X1, X2, . . . be a sequence of i.i.d.\nrandom variables drawn from a distribution \u00b5 on R with Cumulative Distribution Function (CDF) F ,\nand let x1, x2, . . . be their realizations. The empirical distribution \u00b5n is\n\nn(cid:88)\n\ni=1\n\n\u00b5n (cid:44) 1\nn\n\n\u03b4xi,\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f1\n\nn \u00b7(cid:12)(cid:12){1 \u2264 i \u2264 n : xi \u2264 t}(cid:12)(cid:12). The Glivenko-Cantelli Theorem formalizes the statement that \u00b5n\n\nwhere \u03b4xi is the constant distribution supported on xi. Let Fn denote the CDF of \u00b5n, i.e., Fn(t) (cid:44)\nconverges to \u00b5 as n grows, by establishing that Fn(t) converges to F (t), uniformly over all t \u2208 R:\nTheorem 1.1 (Glivenko-Cantelli Theorem, [13, 6]). Almost surely,\n\n(cid:12)(cid:12)Fn(t) \u2212 F (t)(cid:12)(cid:12) = 0.\n\nn\u2192\u221e sup\nlim\n\nt\n\nSome twenty years after Glivenko [13] and Cantelli [6] discovered this theorem, Dvoretzky, Kiefer,\nand Wolfowitz (DKW) [12] strengthened this result by giving an almost1 tight quantitative bound on\nthe convergence rate. In 1990, Massart [17] proved a tight inequality, con\ufb01rming a conjecture due to\nBirnbaum and McCarty [3]:\nTheorem 1.2 ([17]). Pr\n\n(cid:105) \u2264 2 exp(\u22122n\u00012) for all \u0001 > 0, n \u2208 N.\n\n(cid:104)\n\nsupt\n\n(cid:12)(cid:12)Fn(t) \u2212 F (t)(cid:12)(cid:12) > \u0001\n\u2200t :(cid:12)(cid:12)F (t) \u2212 Fn(t)(cid:12)(cid:12) \u2264 \u0001 \u00b7 F (t).\n\nThe above theorems show that, with high probability, F and Fn are close up to some additive error.\nWe would have liked to prove a stronger, multiplicative bound on the error:\n\nHowever, for some distributions, the above event has probability 0, no matter how large n is. For\nexample, assume that \u00b5 satis\ufb01es F (t) > 0 for all t. Since the empirical measure \u00b5n has \ufb01nite support,\nthere is t with Fn(t) = 0; for such a value of t, such a multiplicative approximation fails to hold.\nSo, the above multiplicative requirement is too strong to hold in general. A natural compromise is to\nconsider a submultiplicative bound:\n\n\u2200t :(cid:12)(cid:12)F (t) \u2212 Fn(t)(cid:12)(cid:12) \u2264 \u0001 \u00b7 F (t)\u03b1,\n\nwhere 0 \u2264 \u03b1 < 1. When \u03b1 = 0, this is the additive bound studied in the context of the Glivenko-\nCantelli Theorem. When \u03b1 = 1, this is the unattainable multiplicative bound. Our \ufb01rst main result\nshows that the case of \u03b1 < 1 is attainable:\nTheorem 1.3 (Submultiplicative Glivenko-Cantelli Theorem). Let \u0001 > 0, \u03b4 > 0 and 0 \u2264 \u03b1 < 1.\nThere exists n0(\u0001, \u03b4, \u03b1) such that for all n > n0, with probability 1 \u2212 \u03b4:\n\n\u2200t :(cid:12)(cid:12)F (t) \u2212 Fn(t)(cid:12)(cid:12) \u2264 \u0001 \u00b7 F (t)\u03b1.\n\nthe submultiplicative bound from Theorem 1.3 does not even extend to the VC dimension 1 class\n\nobserve that for every sample x1, . . . , xn, it holds that pn\ntherefore, as long as \u03b1 > 0, it is never the case that\n\nIt is worth pointing out a central difference between Theorem 1.3 and other generalizations of\nthe Glivenko-Cantelli Theorem: for example, the seminal work of Vapnik and Chervonenkis [24]\nshows that for every class of events F of VC dimension d, there is n0 = n0(\u0001, \u03b4, d) such that for\n\nevery n \u2265 n0, with probability 1 \u2212 \u03b4 it holds that \u2200A \u2208 F : (cid:12)(cid:12)p(A) \u2212 pn(A)(cid:12)(cid:12) \u2264 \u0001. This yields\nGlivenko-Cantelli by plugging F =(cid:8)(\u2212\u221e, t] : t \u2208 R(cid:9), which has VC dimension 1. In contrast,\nF =(cid:8){t} : t \u2208 R(cid:9). Indeed, pick any distribution p over R such that p(cid:0){t}(cid:1) = 0 for every t, and\n(cid:0){xi}(cid:1) \u2265 1/n, however p(cid:0){xi}(cid:1) = 0, and\n(cid:12)(cid:12)(cid:12)p(cid:0){xi}(cid:1) \u2212 pn\n\uf8fc\uf8fd\uf8fe ,\n(cid:32)\n(cid:17)\n(cid:16) ln(1/\u03b4)\n\nOur second main result gives an explicit upper bound on n0(\u0001, \u03b4, \u03b1):\nTheorem 1.4 (Submultiplicative Glivenko-Cantelli Bound). Let \u0001, \u03b4 \u2264 1/4, and \u03b1 < 1. Then\n\n(cid:0){xi}(cid:1)(cid:12)(cid:12)(cid:12) \u2264 p(cid:0){xi}(cid:1)\u03b1.\n(cid:19)(cid:33) 4\u03b1\n(cid:18)\n\nNote that for \ufb01xed \u0001, \u03b4, when \u03b1 \u2192 0 the above bound approaches the familiar O\nbound\nby DKW [12] and Massart [17] for \u03b1 = 0. On the other hand, when \u03b1 \u2192 1 the above bound tends\n\n\uf8f1\uf8f2\uf8f3 ln(cid:0)6/\u03b4(cid:1)\n(cid:16) \u0001\u03b4\n(cid:1)(cid:17)\u2212 4\u03b1\n(cid:16) \u0001\u03b4\n6 \u00b7 ln(cid:0) 1+\u03b1\n\n2\u00012\n\n2\u03b1\n\n3\n\nn0(\u0001, \u03b4, \u03b1) \u2264 max\n\n12 \u00b7 D + 4\n\u03b4(1 \u2212 \u03b1)\n\nwhere D = ln(6/\u03b4)\n2\u00012\n\n(cid:17)\u2212 4\u03b1\n\n1\u2212\u03b1\n\n1\u2212\u03b1 .\n\n,\n\n(D + 1)\n\n10 \u00b7 ln\n\n1\u2212\u03b1\n\n\u00012\n\n1The inequality due to [12] has a larger constant C in front of the exponent on the right hand side.\n\n2\n\n\fto \u221e, re\ufb02ecting the fact that the multiplicative variant of Glivenko-Cantelli (\u03b1 = 1) does not hold.\nTheorems 1.3 and 1.4 are proven in the supplementary material.\nNote that the dependency of the above bound on the con\ufb01dence parameter \u03b4 is polynomial. This\ncontrasts with standard uniform convergence rates, which, due to applications of concentration bounds\nsuch as Chernoff/Hoeffding, achieve logarithmic dependencies on \u03b4. These concentration bounds are\nnot applicable in our setting when the CDF values are very small, and we use Markov\u2019s inequality\ninstead. The following example shows that a polynomial dependency on \u03b4 is indeed necessary and is\nnot due to a limitation of our proof.\nExample 1.5. For large n, consider n independent samples x1, . . . , xn from the uniform distribution\nover [0, 1], and set \u03b1 = 1/2 and \u0001 = 1. The probability of the event\n\n\u2203i : xi \u2264 1/n3\n\nis roughly 1/n2: indeed, the complementary event has probability (1\u22121/n3)n \u2248 exp(\u22121/n2) \u2248 1\u2212\n\n1/n2. When this happens, we have: Fn(1/n3) \u2265 1/n >> 1/n3+1/n3/2 = F (1/n3)+(cid:2)F (1/n3)(cid:3)1/2.\n\nNote that this happens with probability inverse polynomial in n (roughly 1/n2) and not inverse\nexponential.\n\nAn application to revenue learning. We demonstrate an application of our Submultiplicative\nGlivenko-Cantelli Theorem in the context of a widely studied problem in economics and algorithmic\ngame theory: the problem of revenue learning. In the setting of this problem, a seller has to decide\nwhich price to post for a good she wishes to sell. Assume that each consumer draws her private\nvaluation for the good from an unknown distribution \u00b5. We envision that a consumer with valuation v\nwill buy the good at any price p \u2264 v, but not at any higher price. This implies that the expected\nrevenue at price p is simply r(p) (cid:44) p \u00b7 q(p), where q(p) (cid:44) PrV \u223c\u00b5[V \u2265 p].\nIn the language of machine learning, this problem can be phrased as follows: the examples domain\nZ (cid:44) R+ is the set of all valuations v. The hypothesis space H (cid:44) R+ is the set of all prices p. The\nrevenue (which is a gain, rather than loss) of a price p on a valuation v is the function p \u00b7 1{p\u2264v}.\nThe well-known revenue maximization problem is to \ufb01nd a price p\u2217 that maximizes the expected\nrevenue, given a sample of valuations drawn i.i.d. from \u00b5. In this paper, we consider the more\ndemanding revenue estimation problem: the problem of well-approximating r(p), simultaneously for\nall prices p, from a given sample of valuations. (This clearly also implies a good estimation of the\nmaximum revenue and of a price that yields it.) More speci\ufb01cally, we address the following question:\nwhen do the empirical revenues, rn(p) (cid:44) p \u00b7 qn(p), where qn(p) (cid:44) PrV \u223c\u00b5n[V \u2265 p] = 1\nshow that for some n0, for n \u2265 n0 we have with probability 1 \u2212 \u03b4 that\n\nn \u00b7(cid:12)(cid:12){1 \u2264\ni \u2264 n : xi \u2265 t}(cid:12)(cid:12), uniformly converge to the true revenues r(p)? More speci\ufb01cally, we would like to\n\n(cid:12)(cid:12)r(p) \u2212 rn(p)(cid:12)(cid:12) \u2264 \u0001.\n\nThe revenue estimation problem is a basic instance of the more general problem of uniform conver-\ngence of empirical estimates. The main challenge in this instance is that the prices are unbounded\n(and so are the private valuations that are drawn from the distribution \u00b5).\nUnfortunately, there is no (upper) bound on n0 that is only a function of \u0001 and \u03b4. Moreover, even if\nwe add the expectation of valuations, i.e., E[V ] where V is distributed according to \u00b5, still there is no\nbound on n0 that is a function of only those three parameters (see Section 2.3 for an example). In\ncontrast, when we consider higher moments of the distribution \u00b5, we are able to derive bounds on the\nvalue of n0. These bounds are based on our Submultiplicative Glivenko-Cantelli Bound. Speci\ufb01cally,\nassume that EV \u223c\u00b5[V 1+\u03b8] \u2264 C for some \u03b8 > 0 and C \u2265 1. Then, we show that for any \u0001, \u03b4 \u2208 (0, 1),\nwe have\n\n(cid:104)\u2203v : (cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) > \u0001\n\nPr\n\nThis essentially reduces uniform convergence bounds to our Submultiplicative Glivenko-Cantelli\nvariant. It then follows that there exists n0(C, \u03b8, \u0001, \u03b4) such that for any n \u2265 n0, with probability at\nleast 1 \u2212 \u03b4,\n\n(cid:21)\n\n.\n\n\u0001\n1\n\n1+\u03b8\n\nC\n\nq(v)\n\n1\n\n1+\u03b8\n\n(cid:20)\n(cid:105) \u2264 Pr\n\u2203v : (cid:12)(cid:12)q(v) \u2212 qn(v)(cid:12)(cid:12) >\n(cid:12)(cid:12)rn(v) \u2212 r(v)(cid:12)(cid:12) \u2264 \u0001.\n\n\u2200v :\n\n3\n\n\f(cid:16) ln(1/\u03b4)\n\n(cid:17)\n\nWe remark that when \u03b8 is large, our bound yields n0 \u2248 O\nsample complexity bounds obtainable via DKW [12] and Massart [17].\nWhen \u03b8 \u2192 0, our bound diverges to in\ufb01nity, re\ufb02ecting the fact (discussed above) that there is no\nbound on n0 that depends only on \u0001, \u03b4, and E[V ]. Nevertheless, we \ufb01nd that E[V ] qualitatively\ndetermines whether uniform convergence occurs in the limit. Namely, we show that\n\n, which recovers the standard\n\n\u00012\n\n\u2022 If E\u00b5[V ] < \u221e, then almost surely limn\u2192\u221e supv\n\u2022 Conversely, if E\u00b5[V ] = \u221e, then almost never limn\u2192\u221e supv\n\n(cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) = 0,\n\n(cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) = 0.\n\n1.1 Related work\n\nGeneralizations of Glivenko-Cantelli. Various generalizations of the Glivenko-Cantelli Theorem\nwere established. These include uniform convergence bounds for more general classes of functions as\nwell as more general loss functions (for example, [24, 23, 16, 2]). The results that concern unbounded\nloss functions are most relevant to this work (for example, [9, 8, 23]). We next brie\ufb02y discuss the\nrelevant results from Cortes et al. [8] in the context of this paper; more speci\ufb01cally, in the context of\nTheorem 1.3. To ease presentation, set \u03b1 in this theorem to be 1/2. Theorem 1.3 analyzes the event\nwhere the empirical quantile is bounded by2\n\nwhereas, [8] analyzes the event where it is bounded it by:\n\nqn(p) \u2264 q(p) + \u0001(cid:112)q(p),\nqn(p) \u2265 q(p) \u2212 \u0001(cid:112)q(p).\n\nqn(p) \u2264 \u02dcO(cid:0)q(p) +(cid:112)q(p)/n + 1/n(cid:1),\nqn(p) \u2265 \u02dc\u2126(cid:0)q(p) \u2212(cid:112)qn(p)/n \u2212 1/n(cid:1)\n\nThus, the main difference is the additive 1/n term in the bound from [8]. In the context of uniform\nconvergence of revenues, it is crucial to use the upper bound on the empirical quantile as we do,\nas it guarantees that large prices will not over\ufb01t, which is the main challenge in proving uniform\nconvergence in this context. In particular, the upper bound from [8] does not provide any guarantee\non the revenues of prices p >> n, as for such prices p \u00b7 1/n >> 1.\nIt is also worth pointing out that our lower bound on the empirical quantile implies that with high\nprobability the quantile of the maximum sampled point is at least 1/n2 (or more generally, at least\n1/n1/\u03b1 when \u03b1 (cid:54)= 1/2), while the bound from [8] does not imply any non-trivial lower bound.\nAnother, more qualitative difference is that unlike the bounds in [8] that apply for general VC classes,\nour bound is tailored for the class of thresholds (corresponding to CDF/quantiles), and does not\nextend even to other classes of VC dimension 1 (see the discussion after Theorem 1.3).\n\nUniform convergence of revenues. The problem of revenue maximization is a central problem in\neconomics and Algorithmic Game Theory (AGT). The seminal work of Myerson [20] shows that\ngiven a valuation distribution for a single good, the revenue-maximizing selling mechanism for this\ngood is a posted-price mechanism. In the recent years, there has been a growing interest in the case\nwhere the valuation distribution is unknown, but the seller observes samples drawn from it. Most\npapers in this direction assume that the distribution meets some tail condition that is considered\n\u201cnatural\u201d within the algorithmic game theory community, such as boundedness [18, 21, 19, 1, 14, 10]3,\nsuch as a condition known as Myerson-regularity [11, 15, 7, 10], or such as a condition known as\nmonotone hazard rate [15].4 These papers then go on to derive computation- or sample-complexity\n2For consistency with the canonical statement of the Glivenko-Cantelli theorem, we stated our submultiplica-\ntive variants of this theorem with regard to the CDFs Fn and F . However, these results also hold when replacing\nthese CDFs with the respective quantiles (tail CDFs) qn and q. See Section 2.2 for details.\n\n3The analysis of [1] assumes a bound on the realized revenue (from any possible valuation pro\ufb01le) of any\nmechanism/auction in the class that they consider. For the class of posted-price mechanisms, this is equivalent\nto assuming a bound on the support of the valuation distribution. Indeed, for any valuation v, pricing at v\ngives realized revenue v (from the valuation v), and so unbounded valuations (together with the ability to post\nunbounded prices) imply unbounded realized revenues.\n\n4Both Myerson-regularity and monotone hazard rate are conditions on the second derivative of the revenue\nas a function of the quantile of the underlying distribution. In particular, they impose restrictions on the tail of\nthe distribution.\n\n4\n\n\f\u221a\n\nbounds on learning an optimal price (or an optimal selling mechanism from a given class) for a\ndistribution that meets the assumed condition.\nA recurring theme in statistical learning theory is that learnability guarantees are derived via a,\nsometimes implicit, uniform convergence bound. However, this has not been the case in the context\nof revenue learning. Indeed, while some papers that studied bounded distributions [18, 21, 19, 1]\ndid use uniform convergence bounds as part of their analysis, other papers, in particular those\nthat considered unbounded distributions, had to bypass the usage of uniform convergence by more\nspecialized arguments. This is due to the fact that many unbounded distributions do not satisfy\nany uniform convergence bound. As a concrete example, the (unbounded, Myerson-regular) equal\nrevenue distribution5 has an in\ufb01nite expectation and therefore, by our Theorem 2.3, satis\ufb01es no\nuniform convergence, even in the limit. Thus, it turns out that the works that studied the popular\nclass of Myerson-regular distributions [11, 15, 7, 10] indeed could not have hoped to establish\nlearnability via a uniform convergence argument. For instance, the way [11, 7] establish learnability\nfor Myerson-regular distributions is by considering the guarded ERM algorithm (an algorithm that\nchooses an empirical revenue maximizing price that is smaller than, say, the\nnth largest sampled\nprice), and proving a uniform convergence bound, not for all prices, but only for prices that are, say,\nsmaller than the\nnth largest sampled price, and then arguing that larger prices are likely to have a\nsmall empirical revenue, compared to the guarded empirical revenue maximizer. This means that the\nguarded ERM will output a good price, but it does not (and cannot) imply uniform convergence for\nall prices.\nWe complement the extensive literature surveyed above in a few ways. The \ufb01rst is generalizing\nthe revenue maximization problem to a revenue estimation problem, where the goal is to uniformly\nestimate the revenue of all possible prices, when no bound on the possible valuations is given (or\neven exists). The problem of revenue estimation arises naturally when the seller has additional\nconsiderations when pricing her good, such as regulations that limit the price choice, bad publicity\nif the price is too high (or, conversely, damage to prestige if the price is too low), or willingness to\nsuffer some revenue loss for better market penetration (which may translate to more revenue in the\nfuture). In such a case, the seller may wish to estimate the revenue loss due to posting a discounted\n(or in\ufb02ated) price.\nThe second, and most important, contribution to the above literature is that we consider arbitrary\ndistributions rather than very speci\ufb01c and limited classes of distributions (e.g., bounded, Myerson-\nregular, monotone hazard rate, etc.). Third, we derive \ufb01nite sample bounds in the case that the\nexpected valuation is bounded for some moment larger than 1. We further derive a zero-one law for\nuniform convergence in the limit that depends on the \ufb01niteness of the \ufb01rst moment. Technically, our\nbounds are based on an additive error rather than multiplicative ones, which are popular in the AGT\ncommunity.\n\n\u221a\n\n1.2 Paper organization\n\nThe rest of the paper is organized as follows. Section 2 contains the application of our Submulti-\nplicative Glivenko-Cantelli to revenue estimation, and Section 3 contains a discussion and possible\ndirections of future work. The proof of the Submultiplicative Glivenko-Cantelli variant, and some\nextensions of it, appear in the supplementary material.\n\n2 Uniform Convergence of Empirical Revenues\n\nIn this section we demonstrate an application of our Submultiplicative Glivenko-Cantelli variant by\nestablishing uniform convergence bounds for a family of unbounded random variables in the context\nof revenue estimation.\n\n2.1 Model\n\nConsider a good g that we wish to post a price for. Let V be a random variable that models the\nvaluation of a random consumer for g. Technically, it is assumed that V is a nonnegative random\nvariable, and we denote by \u00b5 its induced distribution over R+. A consumer who values g at a\n\n5This is a distribution that satis\ufb01es the special property that all prices have the same expected revenue.\n\n5\n\n\fvaluation v is willing to buy the good at any price p \u2264 v, but not at any higher price. This implies\nthat the realized revenue to the seller from a (posted) price p is the random variable p \u00b7 1{p\u2264V }. The\nquantile of a value v \u2208 R+ is\n\nThis models the fraction of the consumers in the population that are willing to purchase the good if\npriced at v. The expected revenue from a (posted) price p \u2208 R+ is\n\nr(p) = r(p; \u00b5) (cid:44) E\n\n\u00b5\n\nq(v) = q(v; \u00b5) (cid:44) \u00b5(cid:0){x : x \u2265 v}(cid:1).\n(cid:2)p \u00b7 1{p\u2264V }(cid:3) = p \u00b7 q(p).\nn \u00b7(cid:12)(cid:12){1 \u2264 i \u2264 n : vi \u2265 v}(cid:12)(cid:12).\n(cid:2)p \u00b7 1{p\u2264V }(cid:3) = p \u00b7 qn(p).\n(cid:12)(cid:12)rn(p) \u2212 r(p)(cid:12)(cid:12).\n\n\u0001n (cid:44) sup\n\n\u00b5n\n\np\n\nqn(v) = q(v; \u00b5n) (cid:44) 1\nThe empirical revenue from a price p \u2208 R+ is\nrn(p) = r(p; \u00b5n) (cid:44) E\n\nThe revenue estimation error for a given sample of size n is\n\nLet V1, V2, . . . be a sequence of i.i.d. valuations drawn from \u00b5, and let v1, v2, . . . be their realizations.\nThe empirical quantile of a value v \u2208 R+ is\n\nIt is worth highlighting the difference between revenue estimation and revenue maximization. Let p\u2217\nbe a price that maximizes the revenue, i.e., p\u2217 \u2208 arg supp r(p). The maximum revenue is r\u2217 = r(p\u2217).\nThe goal in many works in revenue maximization is to \ufb01nd a price \u02c6p such that r\u2217 \u2212 r(\u02c6p) \u2264 \u0001, or\nalternatively, to bound r\u2217\nGiven a revenue-estimation error \u0001n, one can clearly maximize the revenue within an additive error\nof 2\u0001n by simply posting a price p\u2217\nn). This\nfollows since\nn) \u2265 rn(p\u2217\n\nn \u2208 arg maxp rn(p), thereby attaining revenue r\u2217\nn) \u2212 \u0001n \u2265 rn(p\u2217) \u2212 \u0001n \u2265 r(p\u2217) \u2212 2\u0001n = r\u2217 \u2212 2\u0001n.\n\nr\u2217\nn = r(p\u2217\n\nn = r(p\u2217\n\n/r( \u02c6p).\n\nTherefore, good revenue estimation implies good revenue maximization.\nWe note that the converse does not hold. Namely, there are distributions for which revenue maximiza-\ntion is trivial but revenue estimation is impossible. One such case is the equal revenue distribution,\nwhere all values in the support of \u00b5 have the same expected revenue. For such distributions, the\nproblem of revenue maximization becomes trivial, since any posted price is optimal. However, as\nfollows from Theorem 2.3, since the expected revenue of such distributions is in\ufb01nite, almost never\ndo the empirical revenues uniformly converge to the true revenues.\n\n2.2 Quantitative bounds on the uniform convergence rate\n\nRecall that we are interested in deriving sample bounds that would guarantee uniform convergence\nfor the revenue estimation problem. We will show that given an upper bound on the kth moment of\nV for some k > 1, we can derive a \ufb01nite sample bound. To this end we utilize our Submultiplicative\nGlivenko-Cantelli Bound (Theorem 1.4).\nWe also consider the case of k = 1, namely that E[V ] is bounded, and show that in this case there is\nstill uniform convergence in the limit, but that there cannot be any guarantees on the convergence\nrate. Interestingly, it turns out that E[V ] < \u221e is not only suf\ufb01cient but also necessary so that in the\nlimit, the empirical revenues uniformly converge to the true revenues (see Section 2.3).\nWe begin by showing that bounds on the kth moment for k > 1 yield explicit bounds on the\nconvergence rate. It is convenient to parametrize by setting k = 1 + \u03b8, where \u03b8 > 0.\nTheorem 2.1. Let EV \u223c\u00b5[V 1+\u03b8] \u2264 C for some \u03b8 > 0 and C \u2265 1, and let \u0001, \u03b4 \u2208 (0, 1). Set6\n\nln(1/\u03b4)\n\u00012 C\nFor any n \u2265 n0, with probability at least 1 \u2212 \u03b4,\n\nn0 = \u02dcO\n\n2\n\n(cid:32)\n\n\u2200v :\n\n1\n\n\u0001\u03b4 ln(cid:0)1 + \u03b8/2(cid:1)(cid:19)4/\u03b8(cid:33)\n(cid:18) 6 \u00b7 C\n(cid:12)(cid:12)rn(v) \u2212 r(v)(cid:12)(cid:12) \u2264 \u0001.\n\n1+\u03b8\n\n1+\u03b8\n\n.\n\n(1)\n\n6The \u02dcO conceals low order terms.\n\n6\n\n\f(cid:16) ln(1/\u03b4)\n\n(cid:17)\n\nsample complexity bound\nNote that when \u03b8 is large, this bound approaches the standard O\nof the additive Glivenko-Cantelli. For example, if all moments are uniformly bounded, then the\nconvergence is roughly as fast as in standard uniform convergence settings (e.g., VC-dimension based\nbounds).\nThe proof of Theorem 2.1 follows from Theorem 1.4 and the next proposition, which reduces bounds\non the uniform convergence rate of the empirical revenues to our Submultiplicative Glivenko-Cantelli.\nProposition 2.2. Let EV \u223c\u00b5[V 1+\u03b8] \u2264 C for some \u03b8 > 0 and C \u2265 1, and let \u0001, \u03b4 \u2208 (0, 1). Then,\n\n\u00012\n\n(cid:104)\u2203v : (cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) > \u0001\n\n(cid:105) \u2264 Pr\n\nPr\n\n(cid:20)\n\u2203v : (cid:12)(cid:12)q(v) \u2212 qn(v)(cid:12)(cid:12) >\n\n(cid:21)\n\n.\n\n\u0001\n\n1\n\n1+\u03b8\n\nC\n\nq(v)\n\n1\n\n1+\u03b8\n\nTheorem 1.4 to the measure \u00b5(cid:48) de\ufb01ned by \u00b5(cid:48)(A) (cid:44) \u00b5(cid:0){\u2212a | a \u2208 A}(cid:1) yields the required result with\n\nThus, to prove Theorem 2.1, we \ufb01rst note that Theorem 1.4 (as well as Theorem 1.3) also holds\nwhen Fn and F are respectively replaced in the de\ufb01nition of n0 with qn and q (indeed, applying\nregard to the measure \u00b5). We then plug \u0001 \u2190 \u0001\n1+\u03b8 into this variant of Theorem 1.4 to\nyield a bound on the right-hand side of the inequality in Proposition 2.2, whose application concludes\nthe proof.\n\nand \u03b1 \u2190 1\n\n1+\u03b8\n\nC\n\n1\n\nProof of Proposition 2.2. By Markov\u2019s inequality:\n\nNow,\n\n(cid:104)\u2203v : (cid:12)(cid:12)r(v)\u2212rn(v)(cid:12)(cid:12) > \u0001\n(cid:105)\n\nPr\n\nq(v) = Pr[V \u2265 v] = Pr[V 1+\u03b8 \u2265 v1+\u03b8] \u2264 C\n\nv1+\u03b8 .\n\n(cid:104)\u2203v : (cid:12)(cid:12)v \u00b7 q(v)\u2212v \u00b7 qn(v)(cid:12)(cid:12) > \u0001\n(cid:105)\n(cid:104)\u2203v : (cid:12)(cid:12)v \u00b7 q(v)\u2212v \u00b7 qn(v)(cid:12)(cid:12) >\n(cid:104)\u2203v : (cid:12)(cid:12)v \u00b7 q(v)\u2212v \u00b7 qn(v)(cid:12)(cid:12) >\n(cid:104)\u2203v : (cid:12)(cid:12)q(v)\u2212qn(v)(cid:12)(cid:12) >\n\n\u0001\n1\n\nC\n\n= Pr\n\n= Pr\n\n\u2264 Pr\n\n= Pr\n\nq(v)\n\n1\n\n1+\u03b8\n\n.\n\nC\n\n1+\u03b8\n\n(2)\n\n(cid:105)\n\n1\n\n1+\u03b8\n\n(v1+\u03b8 \u00b7 q(v))\n\n1\n\n1+\u03b8\n\n(v1+\u03b8\u00b7q(v))\n\n1\n\n1+\u03b8\n\n\u0001\n1\n\n1+\u03b8\n\n(v1+\u03b8\u00b7q(v))\n\n(cid:105)\n\n\u0001\n\n(cid:105)\n\nwhere the inequality follows from Equation (2).\n\n2.3 A qualitative characterization of uniform convergence\n\nThe sample complexity bounds in Theorem 2.1 are meaningful as long as \u03b8 > 0, but deteriorate\ndrastically as \u03b8 \u2192 0. Indeed, as the following example shows, there is no bound on the uniform\nconvergence sample complexity that depends only on the \ufb01rst moment of V , i.e., its expectation.\nConsider a distribution \u03b7p so that with probability p we have V = 1/p and otherwise V = 0. Clearly,\nE[V ] = 1. However, we need to sample mp = O(1/p) valuations to see a single nonzero value.\nTherefore, there is no bound on the sample size mp as a function of the expectation, which is simply 1.\nWe can now consider the higher moments of \u03b7p. Consider the kth moment, for k = 1 + \u03b8 and\n\u03b8 > 0, so k > 1. For this moment, we have Ap,\u03b8 = E[V 1+\u03b8] = p\u03b8/(1+\u03b8), which implies that\n\nmp = O(cid:0)1/(Ap,\u03b8)(1+\u03b8)/\u03b8(cid:1). This does allow us to bound mp as a function of \u03b8 and E[V 1+\u03b8], but for\n\nsmall \u03b8 we have a huge exponent of approximately 1/\u03b8.\nWhile the above examples show that there cannot be a bound on the sample size as a function of the\nexpectation of the value, it turns out that there is a very tight connection between the \ufb01rst moment\nand uniform convergence:\nTheorem 2.3. The following dichotomy holds for a distribution \u00b5 on R+:\n\n1. If E\u00b5[V ] < \u221e, then almost surely limn\u2192\u221e supv\n2. If E\u00b5[V ] = \u221e, then almost never limn\u2192\u221e supv\n\n(cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) = 0.\n(cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) = 0.\n\n7\n\n\fThat is, the empirical revenues uniformly converge to the true revenues if and only if E\u00b5[V ] < \u221e.\nWe use the following basic fact in the Proof of Theorem 2.3:\nLemma 2.4. Let X be a nonnegative random variable. Then\n\n\u221e(cid:88)\n\nPr[X \u2265 n] \u2264 E[X] \u2264\n\nPr[X \u2265 n].\n\n\u221e(cid:88)\n\nProof. Note that:\n\nn=1\n\nn=0\n\n1{X\u2265n} = (cid:98)X(cid:99) \u2264 X \u2264 (cid:98)X(cid:99) + 1 =\n\n\u221e(cid:88)\n\nn=1\n\n\u221e(cid:88)\n\nn=0\n\n1{X\u2265n}.\n\nThe lemma follows by taking expectations.\nProof of Theorem 2.3. We start by proving item 2. Let \u00b5 be a distribution such that E\u00b5\nsupv v \u00b7 q(v) = \u221e then for every realization v1, . . . , vn there is some v \u2265 max{v1, . . . , vn} such\nthat v\u00b7 q(v) \u2265 1, but v\u00b7 qn(v) = 0. So, we may assume supv v\u00b7 q(v) < \u221e. Without loss of generality\nwe may assume that supv v\u00b7 q(v) = 1/2 by rescaling the distribution if needed. Consider the sequence\nof events E1, E2, . . . where En denotes the event that Vn \u2265 n. Since E\u00b5\nn=1 Pr[En] = \u221e. Thus, since these events are independent, the second Borel-Cantelli\n\n(cid:2)V(cid:3) = \u221e. If\n(cid:2)V(cid:3) = \u221e, Lemma 2.4\n\nimplies that(cid:80)\u221e\n\nLemma [4, 5] implies that almost surely, in\ufb01nitely many of them occur and so in\ufb01nitely often\n\nVn \u00b7 qn(Vn) \u2265 1 \u2265 Vn \u00b7 q(Vn) + 1\n2 .\n\nTherefore, the probability that v \u00b7 qn(v) uniformly converge to v \u00b7 q(v) is 0.\nItem 1 follows from the following monotone domination theorem:\nTheorem 2.5. Let F be a family of nonnegative monotone functions, and let F be an upper envelope7\nfor F. If E\u00b5[F ] < \u221e, then almost surely:\n\nIndeed, item 1 follows by plugging F =(cid:8)v \u00b7 1x\u2265v : v \u2208 R+(cid:9), which is uniformly bounded by the\n\nidentity function F (x) = x. Now, by assumption E\u00b5[F ] < \u221e, and therefore, almost surely\n\nn\u2192\u221e sup\nlim\nf\u2208F\n\n\u00b5n\n\n(cid:12)(cid:12)r(v) \u2212 rn(v)(cid:12)(cid:12) = lim\n\nn\u2192\u221e sup\nf\u2208F\n\n[f ] \u2212 E\n\n\u00b5n\n\n[f ](cid:12)(cid:12) = 0.\n\nn\u2192\u221e sup\nlim\nv\u2208R+\n\n(cid:12)(cid:12)E\n\n\u00b5\n\n[f ] \u2212 E\n\n[f ](cid:12)(cid:12) = 0.\n(cid:12)(cid:12)E\n\n\u00b5\n\nTheorem 2.5 follows by known results in the theory of empirical processes (for example, with some\nwork it can be proved using Theorem 2.4.3 from Vaart and Wellner [22]). For completeness, we give\na short and basic proof in the supplementary material.\n\n3 Discussion\n\nOur main result is a submultiplicative variant of the Glivenko-Cantelli Theorem, which allows for\ntighter convergence bounds for extreme values of the CDF. We show that for the revenue learning\nsetting our submultiplicative bound can be used to derive uniform convergence sample complexity\nbounds, assuming a \ufb01nite bound on the kth moment of the valuations, for any (possibly fractional)\nk > 1. For uniform convergence in the limit, we give a complete characterization, where uniform\nconvergence almost surely occurs if and only if the \ufb01rst moment is \ufb01nite.\nIt would be interesting to \ufb01nd other applications of our submultiplicative bound in other settings. A\npotentially interesting direction is to consider unbounded loss functions (e.g., the squared-loss, or\nlog-loss). Many works circumvent the unboundedness in such cases by ensuring (implicitly) that\nthe losses are bounded, e.g., through restricting the inputs and the hypotheses. Our bound offers a\ndifferent perspective of addressing this issue. In this paper we consider revenue learning, and replace\nthe boundedness assumption by assuming bounds on higher moments. An interesting challenge is to\n\n7F is an upper envelope for F if F (v) \u2265 f (v) for every v \u2208 V and f \u2208 F.\n\n8\n\n\fprove uniform convergence bounds for other practically interesting settings. One such setting might\nbe estimating the effect of outliers (which correspond to the extreme values of the loss).\nIn the context of revenue estimation, this work only considers the most na\u00efve estimator, namely of\nestimating the revenues by the empirical revenues. One can envision other estimators, for example\nones which regularize the extreme tail of the sample. Such estimators may have a potential of\nbetter guarantees or better convergence bounds. In the context of uniform convergence of selling\nmechanism revenues, this work only considers the basic class of posted-price mechanisms. While\nfor one good and one valuation distribution, it is always possible to maximize revenue via a selling\nmechanism of this class, this is not the case in more complex auction environments. While in many\nmore-complex environments, the revenue-maximizing mechanism/auction is still not understood well\nenough, for environments where it is understood [7, 10, 14] (as well as for simple auction classes that\ndo not necessarily contain a revenue-maximizing auction [19, 1]) it would also be interesting to study\nrelaxations of the restrictive tail or boundedness assumptions currently common in the literature.\n\nAcknowledgments\n\nThe research of Noga Alon is supported in part by an ISF grant and by a GIF grant. Yannai\nGonczarowski is supported by the Adams Fellowship Program of the Israel Academy of Sciences\nand Humanities; his work is supported by ISF grant 1435/14 administered by the Israeli Academy\nof Sciences and by Israel-USA Bi-national Science Foundation (BSF) grant number 2014389; this\nproject has received funding from the European Research Council (ERC) under the European Union\u2019s\nHorizon 2020 research and innovation programme (grant agreement No 740282). The research of\nYishay Mansour was supported in part by The Israeli Centers of Research Excellence (I-CORE)\nprogram (Center No. 4/11), by a grant from the Israel Science Foundation, and by a grant from\nUnited States-Israel Binational Science Foundation (BSF); the research was done while author was\nco-af\ufb01liated with Microsoft Research. The research of Shay Moran is supported by the National\nScience Foundations and the Simons Foundations; part of the research was done while author was\nco-af\ufb01liated with Microsoft Research. The research of Amir Yehudayoff is supported by ISF grant\n1162/15.\n\nReferences\n[1] Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Sample complexity of automated mechanism\ndesign. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS), pages\n2083\u20132091, 2016.\n\n[2] Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and\n\nstructural results. Journal of Machine Learning Research, 3:463\u2013482, 2002.\n\n[3] Z. W. Birnbaum and R. C. McCarty. A distribution-free upper con\ufb01dence bound for Pr{Y < X}, based\n\non independent samples of X and Y . The Annals of Mathematical Statistics, 29(2):558\u2013562, 1958.\n\n[4] \u00c9mile Borel. Les probabilit\u00e9s d\u00e9nombrables et leurs applications arithm\u00e9tiques. Rendiconti del Circolo\n\nMatematico di Palermo (1884-1940), 27(1):247\u2013271, 1909.\n\n[5] Francesco Paolo Cantelli. Sulla probabilit\u00e1 come limite della frequenza. Atti Accad. Naz. Lincei, 26(1):39\u2013\n\n45, 1917.\n\n[6] Francesco Paolo Cantelli. Sulla determinazione empirica delle leggi di probabilita. Giornalle dell\u2019Istituto\n\nItaliano degli Attuari, 4:421\u2013424, 1933.\n\n[7] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of\n\nthe 46th Annual ACM Symposium on Theory of Computing (STOC), pages 243\u2013252, 2014.\n\n[8] Corinna Cortes, Spencer Greenberg, and Mehryar Mohri. Relative deviation learning bounds and general-\n\nization with unbounded loss functions. CoRR, abs/1310.5796, 2013.\n\n[9] Corinna Cortes, Yishay Mansour, and Mehryar Mohri. Learning bounds for importance weighting. In\nProceedings of the 24th Conference on Neural Information Processing Systems (NIPS), pages 442\u2013450,\n2010.\n\n[10] Nikhil R. Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of auctions\nwith side information. In Proceedings of the 48th Annual ACM Symposium on Theory of Computing\n(STOC), pages 426\u2013439, 2016.\n\n9\n\n\f[11] Peerapong Dhangwatnotai, Tim Roughgarden, and Qiqi Yan. Revenue maximization with a single sample.\n\nGames and Economic Behavior, 91:318\u2013333, 2015.\n\n[12] Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample\ndistribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics,\n27(3):642\u2013669, 1956.\n\n[13] VL Glivenko. Sulla determinazione empirica delle leggi di probabilita. Giornalle dell\u2019Istituto Italiano\n\ndegli Attuari, 4:92\u201399, 1933.\n\n[14] Yannai A. Gonczarowski and Noam Nisan. Ef\ufb01cient empirical revenue maximization in single-parameter\nauction environments. In Proceedings of the 49th Annual ACM Symposium on Theory of Computing\n(STOC), pages 856\u2013868, 2017.\n\n[15] Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. In Proceedings\n\nof the 16th ACM Conference on Economics and Computation (EC), pages 45\u201360, 2015.\n\n[16] Vladimir Koltchinskii and Dmitriy Panchenko. Rademacher Processes and Bounding the Risk of Function\n\nLearning, pages 443\u2013457. Birkh\u00e4user Boston, Boston, MA, 2000.\n\n[17] Pascal Massart. The tight constant in the dvoretzky-kiefer-wolfowitz inequality. The Annals of Probability,\n\n18(3):1269\u20131283, 1990.\n\n[18] Jamie Morgenstern and Tim Roughgarden. On the pseudo-dimension of nearly optimal auctions. In\nProceedings of the 29th Conference on Neural Information Processing Systems (NIPS), pages 136\u2013144,\n2015.\n\n[19] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In Proceedings of the 29th Annual\n\nConference on Learning Theory (COLT), pages 1298\u20131318, 2016.\n\n[20] Roger Myerson. Optimal auction design. Mathematics of Operations Research, 6(1):58\u201373, 1981.\n\n[21] Tim Roughgarden and Okke Schrijvers. Ironing in the dark. In Proceedings of the 17th ACM Conference\n\non Economics and Computation (EC), pages 1\u201318, 2016.\n\n[22] A. W. van der Vaart and Jon August Wellner. Weak convergence and empirical processes : with applications\n\nto statistics. Springer series in statistics. Springer, New York, 1996. R\u00e9impr. avec corrections 2000.\n\n[23] Vladimir Vapnik. Statistical Learning Theory. Wiley, 1998.\n\n[24] V.N. Vapnik and A.Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to\n\ntheir probabilities. Theory Probab. Appl., 16:264\u2013280, 1971.\n\n10\n\n\f", "award": [], "sourceid": 1037, "authors": [{"given_name": "Noga", "family_name": "Alon", "institution": "Tel Aviv University"}, {"given_name": "Moshe", "family_name": "Babaioff", "institution": "Microsoft Research"}, {"given_name": "Yannai A.", "family_name": "Gonczarowski", "institution": "The Hebrew University of Jerusalem and Microsoft Research"}, {"given_name": "Yishay", "family_name": "Mansour", "institution": "Tel Aviv University"}, {"given_name": "Shay", "family_name": "Moran", "institution": "IAS, Princeton"}, {"given_name": "Amir", "family_name": "Yehudayoff", "institution": "Technion - Israel institue of Technology"}]}