{"title": "Concentration of risk measures: A Wasserstein distance approach", "book": "Advances in Neural Information Processing Systems", "page_first": 11762, "page_last": 11771, "abstract": "Known finite-sample concentration bounds for the Wasserstein distance between the empirical and true distribution of a random variable are used to derive a two-sided concentration bound for the error between the true conditional value-at-risk (CVaR) of a (possibly unbounded) random variable and a standard estimate of its CVaR computed from an i.i.d. sample. The bound applies under fairly general assumptions on the random variable, and improves upon previous bounds which were either one sided, or applied only to bounded random variables. Specializations of the bound to sub-Gaussian and sub-exponential random variables are also derived. A similar procedure is followed to derive concentration bounds for the error between the true and estimated Cumulative Prospect Theory (CPT) value of a random variable, in cases where the random variable is bounded or sub-Gaussian. These bounds are shown to match a known bound in the bounded case, and improve upon the known bound in the sub-Gaussian case. The usefulness of the bounds is illustrated through an algorithm, and corresponding regret bound for a stochastic bandit problem, where the underlying risk measure to be optimized is CVaR.", "full_text": "Concentration of risk measures:\nA Wasserstein distance approach\n\nSanjay P. Bhat\n\nTata Consultancy Services Limited\n\nHyderabad, Telangana, India\nsanjay.bhat@tcs.com\n\nDepartment of Computer Science and Engineering\n\nIndian Institute of Technology Madras, India\n\nprashla@cse.iitm.ac.in\n\nPrashanth L.A.\n\n\u2217\n\nAbstract\n\nKnown \ufb01nite-sample concentration bounds for the Wasserstein distance between\nthe empirical and true distribution of a random variable are used to derive a two-\nsided concentration bound for the error between the true conditional value-at-risk\n(CVaR) of a (possibly unbounded) random variable and a standard estimate of its\nCVaR computed from an i.i.d. sample. The bound applies under fairly general\nassumptions on the random variable, and improves upon previous bounds which\nwere either one sided, or applied only to bounded random variables. Specializations\nof the bound to sub-Gaussian and sub-exponential random variables are also\nderived. Using a different proof technique, the results are extended to the class\nof spectral risk measures having a bounded risk spectrum. A similar procedure\nis followed to derive concentration bounds for the error between the true and\nestimated Cumulative Prospect Theory (CPT) value of a random variable, in cases\nwhere the random variable is bounded or sub-Gaussian. These bounds are shown\nto match a known bound in the bounded case, and improve upon the known bound\nin the sub-Gaussian case. The usefulness of the bounds is illustrated through an\nalgorithm, and corresponding regret bound for a stochastic bandit problem, where\nthe underlying risk measure to be optimized is CVaR.\n\n1\n\nIntroduction\n\nConditional Value-at-Risk (CVaR) and cumulative prospect theory (CPT) value are two popular risk\nmeasures. CVaR is popular in \ufb01nancial applications, where it is necessary to minimize the worst-case\nlosses, say in a portfolio optimization context. CVaR is a special instance of the class of spectral\nrisk measures [Acerbi, 2002]. CVaR is an appealing risk measure because it is coherent [Artzner\net al., 1999], and spectral risk measures retain this property. CPT value is a risk measure, proposed\nby Tversky and Kahnemann, that is useful for modeling human preferences. The central premise in\nrisk-sensitive optimization is that the expected value is not an appealing objective in several practical\napplications, and it is necessary to incorporate some notion of risk in the optimization process. The\nreader is referred to extensive literature on risk-sensitive optimization, in particular, the shortcomings\nof the expected value - cf. [Allais, 1953, Ellsberg, 1961, Kahneman and Tversky, 1979, Rockafellar\nand Uryasev, 2000].\nIn practical applications, the information about the underlying distribution is typically unavailable.\nHowever, one can often obtain samples from the distribution, and the aim is to estimate the chosen\nrisk measure using these samples. We consider this problem of estimation in the context of three\nrisk measures: CVaR, a general spectral risk measure, and CPT-value. For each of the three\nrisk measures, we examine the estimator obtained by applying the risk measure to the empirical\n\n\u2217Supported in part by a DST grant under the ECRA program.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fdistribution constructed from an i.i.d. sample. In the case of CVaR and CPT value, the estimators\nobtained in this way are already available in the literature. Our goal is to derive concentration bounds\nfor estimators of all three risk measures, and we achieve this in a novel manner by relating the\nestimation error to the Wasserstein distance between the empirical and true distributions, and then\nusing known concentration bounds for the latter. We summarize our contributions below, which apply\nwhen the underlying distribution has a bounded exponential moment, or a higher-order moment.\nSub-Gaussian distributions are a popular class that satisfy the former condition, while the latter\nincludes sub-exponential distributions.\n(1) For the case of CVaR, we provide a two-sided concentration bound for both classes of distributions\nmentioned above. In particular, for the special case of sub-Gaussian distributions, our tail bound\n\nis of the order O(cid:0)exp(cid:0)\u2212cn\u00012(cid:1)(cid:1), where n is the number of samples, \u0001 is the accuracy parameter,\n\nand c is a universal constant. Our bound matches the rate obtained for distributions with bounded\nsupport in [Brown, 2007], and features improved dependence on \u0001 as compared to the one derived\nfor sub-Gaussian distributions in [Kolla et al., 2019b]. Further, unlike the latter work, we provide\ntwo-sided concentration bounds for CVaR estimation. Similar bounds are shown to hold for any\nspectral risk measure having a bounded risk spectrum.\n\n(2) For the case of CPT-value, we obtain an order O(cid:0)exp(cid:0)\u2212cn\u00012(cid:1)(cid:1) for the case of distributions with\n\nbounded support, matching the rate in [Cheng et al., 2018]. For the case of sub-Gaussian distributions,\nwe provide a bound that has an improved dependence on the number of samples n, as compared to\nthe corresponding bound derived by [Cheng et al., 2018].\n(3) As a minor contribution, our concentration bounds open avenues for bandit applications, and we\nillustrate this claim by considering a risk-sensitive bandit setting, with CVaR as the underlying risk\nmeasure. For this bandit problem with underlying arms\u2019 distribution assumed to be sub-Gaussian,\nwe derive a regret bound using the CVaR concentration bound mentioned above. Previous works\n(cf. [Galichet et al., 2013]) consider CVaR optimization in a bandit context, with arms\u2019 distributions\nhaving bounded support.\nSince CVaR and spectral risk measures are weighted averages of the underlying distribution quantiles,\na natural alternative to a Wasserstein-distance-based approach is to employ concentration results for\nquantiles such as in Kolla et al. [2019b]. While such an approach can provide bounds with better\nconstants, the resulting bounds also involve distribution-dependent quantities (see Kolla et al. [2019b],\nfor instance), and require different proofs for sub-Gaussian and sub-exponential random variables. In\ncontrast, our approach provides a uni\ufb01ed method of proof.\nThe rest of the paper is organized as follows: In Section 2, we cover background material that includes\nWasserstein distance, and sub-Gaussian and sub-exponential distributions. In Section 3\u20135, we present\nconcentration bounds for CVaR, spectral risk measures and CPT-value estimation, respectively. In\nSection 6, we discuss a bandit application, and \ufb01nally, in Section 7, we provide the concluding\nremarks. The proofs of all the claims in Sections 3\u20135 are given in the supplementary material.\n\n2 Wasserstein Distance\n\nIn this section, we introduce the notion of Wasserstein distance, a popular metric for measuring the\nproximity between two distributions. The reader is referred to Chapter 6 of [Villani, 2008] for a\ndetailed introduction.\nGiven two cumulative distribution functions (CDFs) F1 and F2 on R, let \u0393(F1, F2) denote the set of\nall joint distributions on R2 having F1 and F2 as marginals.\nDe\ufb01nition 1. Given two CDFs F1 and F2 on R, the Wasserstein distance between them is de\ufb01ned by\n\n(cid:20)\n\n(cid:90)\n\n(cid:21)\n\nW1(F1, F2) =\n\ninf\n\nF\u2208\u0393(F1,F2)\n\nR2\n\n|x \u2212 y|dF (x, y)\n\n.\n\n(1)\n\nGiven L > 0 and p > 0, a function f : R \u2192 R is L-H\u00f6lder of order p if |f (x) \u2212 f (y)| \u2264 L|x \u2212 y|p\nfor all x, y \u2208 R. The function f : R \u2192 R is L-Lipschitz if it is L-H\u00f6lder of order 1. Finally, if F is\na CDF on R, we de\ufb01ne the generalized inverse F \u22121 : [0, 1] \u2192 R of F by F \u22121(\u03b2) = inf{x \u2208 R :\nF (x) \u2265 \u03b2}. In the case where F is strictly increasing and continuous, F \u22121 equals the usual inverse\nof a bijective function.\n\n2\n\n\fThe Wasserstein distance between the CDFs F1 and F2 of two random variables X and Y , respectively,\nmay be alternatively written as follows:\nsup|E (f (X)) \u2212 E(f (Y ))| = W1(F1, F2) =\n\n|F1(s)\u2212F2(s)|ds =\n\n(cid:90) \u221e\n\n(cid:90) 1\n\n|F \u22121\n\n(\u03b2)\u2212F \u22121\n\n(\u03b2)|d\u03b2,\n(2)\nwhere the supremum in (2) is over all functions f : R \u2192 R that are 1-Lipschitz. Equation (2) is\nstated and proved as a lemma in Bhat and Prashanth [2019].\nThe results that we provide in this paper pertain to the case where a r.v. X satis\ufb01es either an\nexponential moment bound or a higher-order moment bound. We make these conditions precise\nbelow.\n\n(C1) There exist \u03b2 > 0 and \u03b3 > 0 such that E(cid:0)exp(cid:0)\u03b3|X \u2212 \u00b5|\u03b2(cid:1)(cid:1) < (cid:62) < \u221e, where \u00b5 = E(X).\n(C2) There exists \u03b2 > 0 such that E(cid:0)|X \u2212 \u00b5|\u03b2(cid:1) < (cid:62) < \u221e, where \u00b5 = E(X).\n\n\u2212\u221e\n\n0\n\n1\n\n2\n\nWe next de\ufb01ne sub-Gaussian and sub-exponential r.v.s., which are two popular classes of unbounded\nr.v.s, that satisfy assumptions (C1) and (C2), respectively.\nDe\ufb01nition 2. A r.v. X with mean \u00b5 is sub-Gaussian if there exists a \u03c3 > 0 such that\n\nE(exp (\u03bb(X \u2212 \u00b5))) \u2264 exp\n\nfor any \u03bb \u2208 R.\n\n(cid:18)\n\n(cid:18) (X \u2212 \u00b5)2\n\n(cid:19)(cid:19)\n\nE\n\nexp\n\n4\u03c32\n\n\u221a\n\n\u2264\n\nA sub-Gaussian r.v. X with mean \u00b5 satis\ufb01es (see items (II) and (IV) in Theorem 2.1 of [Wainwright,\n2019] for a proof)\n\n2, and P (X \u2212 \u00b5 > \u03b7) \u2264 8 exp(\u2212 \u03b72\n\n2\u03c32 ), for \u03b7 \u2265 0.\n\n(3)\n\u221a\n\nThe \ufb01rst bound above implies that sub-Gaussian r.v.s satisfy (C1) with \u03b2 = 2, \u03b3 = 1\nIn particular, bounded r.v.s are sub-Gaussian, and satisfy (C1) with \u03b2 = 2.\nDe\ufb01nition 3. Given \u03c3 > 0, a r.v. X with mean \u00b5 is \u03c3 sub-exponential if there exist non-negative\nparameters \u03c3 and b such that\n\n2.\n\n4\u03c32 and (cid:62) =\n\n(cid:18) \u03bb2\u03c32\n\n(cid:19)\n\n2\n\n(cid:18) \u03bb2\u03c32\n\n(cid:19)\n\n2\n\nE(exp (\u03bb(X \u2212 \u00b5))) \u2264 exp\n\nfor any |\u03bb| <\n\n1\nb\n\n.\n\n(cid:34)E(cid:2)(X \u2212 \u00b5)k(cid:3)\n\n(cid:35) 1\n\nk\n\nsup\nk\u22652\n\nk!\n\nA sub-exponential r.v. X with mean \u00b5 satis\ufb01es (see items (III) and (IV) in Theorem 2.2 of [Wainwright,\n2019] for a proof)\n\n<\u221e, and \u2203k1, k2 > 0 such that P (X \u2212 \u00b5 > \u03b7) \u2264 k1 exp(\u2212k2\u03b7),\u2200\u03b7 \u2265 0.\n(4)\n\nThe bound (4) implies that sub-exponential r.v.s satisfy (C2) for integer values of \u03b2 \u2265 2.\nThe following result from Fournier and Guillin [2015] bounds the Wasserstein distance between the\nempirical distribution function (EDF) of an i.i.d. sample and the underlying CDF from which the\nsample is drawn. Recall that, given X1, . . . , Xn i.i.d. samples from the distribution F of a r.v. X, the\nEDF Fn is de\ufb01ned by\n\nI{Xi \u2264 x} ,\n\nfor any x \u2208 R.\n\n(5)\n\nn(cid:88)\n\ni=1\n\nFn (x) =\n\n1\nn\n\nLemma 1. (Wasserstein distance bound) Let X be a r.v. with CDF F and mean \u00b5. Suppose that\neither (i) X satis\ufb01es (C1) with \u03b2 > 1, or (ii) X satis\ufb01es (C2) with \u03b2 > 2. Then, for any \u0001 \u2265 0, we\nhave\n\nwhere, under (i),\n\nP (W1(Fn, F ) > \u0001) \u2264 B(n, \u0001),\n\nB(n, \u0001) = C(cid:0)exp(cid:0)\u2212cn\u00012(cid:1) I{\u0001 \u2264 1} + exp(cid:0)\u2212cn\u0001\u03b2(cid:1) I{\u0001 > 1}(cid:1) ,\n\n3\n\n\ffor some C, c that depend on the parameters \u03b2, \u03b3 and (cid:62) speci\ufb01ed in (C1); and under (ii),\n\n(cid:16)\n\nexp(cid:0)\u2212cn\u00012(cid:1) I{\u0001 \u2264 1} + n (n\u0001)\n\n\u2212(\u03b2\u2212\u03b7)/p I{\u0001 > 1}(cid:17)\n\n.\n\nB(n, \u0001) = C\n\nwhere \u03b7 could be chosen arbitrarily from (0, \u03b2), while C, c depend on the parameters \u03b2, \u03b7 and (cid:62)\nspeci\ufb01ed in (C2).\n\nProof. The lemma follows directly by applying Theorem 2 in [Fournier and Guillin, 2015] to the\nrandom variable X \u2212 \u00b5, and noting from (2) that the Wasserstein distance remains invariant if the\nsame constant is added to both random variables.\n\n3 Conditional Value-at-Risk\n\nWe now introduce the notion of CVaR, a risk measure that is popular in \ufb01nancial applications.\nDe\ufb01nition 4. The CVaR at level \u03b1 \u2208 (0, 1) for a r.v X is de\ufb01ned by\n\nC\u03b1(X) = inf\n\u03be\n\n\u03be +\n\n1\n\n(1 \u2212 \u03b1)\n\nE (X \u2212 \u03be)+\n\n, where (y)+ = max(y, 0).\n\n(cid:26)\n\n(cid:27)\n\nn(cid:88)\n\ni=1\n\nIt is well known (see [Rockafellar and Uryasev, 2000]) that the in\ufb01mum in the de\ufb01nition of CVaR\nabove is achieved for \u03be = VaR\u03b1(X), where VaR\u03b1(X) = F \u22121(\u03b1) is the value-at-risk of the random\nvariable X at con\ufb01dence level \u03b1. Thus CVaR may also be written alternatively as given, for instance,\nin [Kolla et al., 2019b]. In the special case where X has a continuous distribution, C\u03b1(X) equals the\nexpectation of X conditioned on the event that X exceeds VaR\u03b1(X).\nAll our results below pertain to i.i.d. samples X1, . . . , Xn drawn from the distribution of X. Follow-\ning Brown [2007], we estimate C\u03b1(X) from such a sample by\n\n(cid:40)\n\n(cid:41)\n\ncn,\u03b1 = inf\n\u03be\n\n\u03be +\n\n1\n\nn(1 \u2212 \u03b1)\n\n(Xi \u2212 \u03be)+\n\n.\n\n(6)\n\nWe now provide a concentration bound for the empirical CVaR estimate (6), by relating the estima-\ntion error |cn,\u03b1 \u2212 C\u03b1(X)| to the Wasserstein distance between the true and empirical distribution\nfunctions, and subsequently invoking Lemma 1 that bounds the Wasserstein distance between these\ntwo distributions. The proof is given in section 5 of Bhat and Prashanth [2019].\nProposition 1. Suppose X either satis\ufb01es (C1) for some \u03b2 > 1 or satis\ufb01es (C2) for some \u03b2 > 2.\nUnder (C1), for any \u0001 > 0, we have\n\nP (|cn,\u03b1 \u2212 C\u03b1(X)| > \u0001) \u2264 C(cid:2)exp(cid:2)\u2212cn(1 \u2212 \u03b1)2\u00012(cid:3) I{\u0001 \u2264 1} + exp(cid:2)\u2212cn(1 \u2212 \u03b1)\u03b2\u0001\u03b2(cid:3) I{\u0001 > 1}(cid:3).\n\u2212(\u03b2\u2212\u03b7) I{\u0001 > 1}(cid:105)\n\n(cid:104)\n\nUnder (C2), for any \u0001 > 0, we have\nP (|cn,\u03b1 \u2212 C\u03b1(X)| > \u0001) \u2264 C\nIn the above, the constants C, c and \u03b7 are as in Lemma 1.\n\nexp(cid:2)\u2212cn(1 \u2212 \u03b1)2\u00012(cid:3) I{\u0001 \u2264 1} + n (n(1 \u2212 \u03b1)\u0001)\n\n.\n\nThe following corollary, which specializes Proposition 1 to sub-Gaussian random r.v.s., is immediate,\nas sub-Gaussian random variables satisfy (C1) with \u03b2 = 2.\nCorollary 1. For a sub-Gaussian r.v. X, we have that\n\nP (|cn,\u03b1 \u2212 C\u03b1(X)| > \u0001) \u2264 2C exp(cid:0)\u2212cn(1 \u2212 \u03b1)2\u00012(cid:1) , for any \u0001 \u2265 0,\n\nwhere C, c are constants that depend on the sub-Gaussianity parameter \u03c3.\n\nIn terms of dependence on n and \u0001, the tail bound above is better than the one-sided concentration\nbound in [Kolla et al., 2019b]. In fact, the dependence on n and \u0001 matches that in the case of bounded\ndistributions (cf. [Brown, 2007, Wang and Gao, 2010]).\nThe case of sub-exponential distributions can be handled by specializing the second result in Propo-\nsition 1. In particular, observing that sub-exponential distributions satisfy (C2) for any \u03b2 \u2265 2, and\nProposition 1 requires \u03b2 > 2 in case (ii), we obtain the following bound:\n\n4\n\n\fCorollary 2. For a sub-exponential r.v. X, for any \u0001 \u2265 0, we have\n\n(cid:104)\n\nexp(cid:2)\u2212cn(1 \u2212 \u03b1)2\u00012(cid:3) I{\u0001 \u2264 1}+n [n(1 \u2212 \u03b1)\u0001]\u03b7\u22123 I{\u0001 > 1}(cid:105)\n\n,\n\nP (|cn,\u03b1 \u2212 C\u03b1(X)| > \u0001)\u2264 C\n\nwhere C, c and \u03b7 are as in Lemma 1.\nFor small deviations, i.e., \u0001 \u2264 1, the bound above is satisfactory, as the tail decay matches that of\na Gaussian r.v. with constant variance. On the other hand, for large \u0001, the second term exhibits\npolynomial decay. The latter polynomial term is not an artifact of our analysis, and instead, it relates\nto the rate obtained in case (ii) of Lemma 1. Sub-exponential distributions satisfy an exponential\nmoment bound with \u03b2 = 1, and for this case, the authors in [Fournier and Guillin, 2015] remark\nthat they were not able to obtain a satisfactory concentration result. Recently, Prashanth et al. [2019]\nhave derived an improved bound for the sub-exponential case using a technique not based on the\nWasserstein distance.\n\n4 Spectral risk measures\nSpectral risk measures are a generalization of CVaR. Given a weighting function \u03c6 : [0, 1] \u2192 [0,\u221e),\nthe spectral risk measure M\u03c6 associated with \u03c6 is de\ufb01ned by\n\nM\u03c6(X) =\n\n\u03c6(\u03b2)F \u22121(\u03b2)d\u03b2,\n\n(7)\n\nwhere X is a random variable with CDF F . If the weighting function, also known as the risk spectrum,\nis increasing and integrates to 1, then M\u03c6 is a coherent risk measure like CVaR. In fact, CVaR is\nitself a special case of (7), with C\u03b1(X) = M\u03c6 for the risk spectrum \u03c6 = (1 \u2212 \u03b1)\u22121I{\u03b2 \u2265 \u03b1} (see\nAcerbi [2002] and Dowd and Blake [2006] for details).\nGiven an i.i.d. sample X1, . . . , Xn drawn from the CDF F of a random variable X, a natural\nempirical estimate of the spectral risk measure M\u03c6(X) of X is\n\n(cid:90) 1\n\n0\n\n(cid:90) 1\n\n0\n\nmn,\u03c6 =\n\n\u03c6(\u03b2)F \u22121\n\nn (\u03b2)d\u03b2.\n\n(8)\n\nIn this section, we restrict ourselves to a spectral risk measure M\u03c6 whose associated risk spectrum\n\u03c6 is bounded. Speci\ufb01cally, we assume that |\u03c6(\u03b2)| \u2264 K for all \u03b2 \u2208 [0, 1] for some K > 0. It\nimmediately follows from (7) and (2) that, if X and Y are random variables with CDFs F1 and F2,\nthen\n\n(9)\nOn noting from (8) that the empirical estimate mn,\u03c6 of M\u03c6(X) is simply the spectral risk measure\nM\u03c6 applied to a random variable whose CDF is Fn, we conclude from (9) that\n\n|M\u03c6(X) \u2212 M\u03c6(Y )| \u2264 KW1(F1, F2).\n\n|M\u03c6(X) \u2212 mn,\u03c6| \u2264 KW1(F, Fn).\n\n(10)\nEquation (10) relates the estimation error |M\u03c6(X) \u2212 mn,\u03c6| to the Wasserstein distance between the\ntrue and empirical CDFs of X. As in the case of CVaR, invoking Lemma 1 provides concentration\nbounds for the empirical spectral risk measure estimate (8). The detailed proof is available in Section\n5 of Bhat and Prashanth [2019].\nProposition 2. Suppose X either satis\ufb01es (C1) for some \u03b2 > 1 or satis\ufb01es (C2) for some \u03b2 > 2. Let\nK > 0 and let \u03c6 : [0, 1] \u2192 [0, K] be a risk spectrum for some K > 0. Under (C1), for any \u0001 > 0,\nwe have\nP (|mn,\u03c6 \u2212 M\u03c6(X)| > \u0001) \u2264 C\n\nI{\u0001 \u2264 1} + exp\n\nI{\u0001 > 1}\n\n\u2212cn\n\n\u2212cn\n\nexp\n\n.\n\n(cid:20)\n(cid:20)\n\n(cid:20)\n(cid:20)\n\n(cid:111)2(cid:21)\n(cid:110) \u0001\n(cid:111)2(cid:21)\n(cid:110) \u0001\n\nK\n\nK\n\n(cid:20)\n(cid:110) \u0001\n\nK\n\n(cid:16)\n\nn\n\n(cid:111)\u03b2(cid:21)\n\n(cid:21)\n(cid:110) \u0001\n(cid:111)(cid:17)\u2212(\u03b2\u2212\u03b7)/p I{\u0001 > 1}\n\nK\n\n(cid:21)\n\n.\n\nUnder (C2), for any \u0001 > 0, we have\nP (|mn,\u03c6 \u2212 M\u03c6(X)| > \u0001) \u2264 C\n\nexp\n\n\u2212cn\n\nI{\u0001 \u2264 1} + n\n\nIn the above, the constants C, c and \u03b7 are as in Lemma 1.\n\n5\n\n\fThe following corollary specializing Proposition 2 to sub-Gaussian random r.v.s. is immediate, as\nsub-Gaussian random variables satisfy (C1) with \u03b2 = 2.\nCorollary 3. For a sub-Gaussian r.v. X and a risk spectrum as in Proposition 2, we have\n\nP (|mn,\u03c6 \u2212 M\u03c6(X)| > \u0001) \u2264 2C exp(cid:0)\u2212cn\u00012/K 2(cid:1) , for any \u0001 \u2265 0,\n\nwhere C, c are constants that depend on \u03c3.\n\nIt is possible to specialize Proposition 2 to the case of sub-exponential random variables to obtain a\ncorollary similar to Corollary 2. However, in the interests of space, we do not present it here.\nTechnically speaking, the concentration bounds for CVaR from Section 3 follow from the results of\nthis section, since CVaR is a special case of a spectral risk measure. However, the proof technique of\nSection 3 uses a different characterization of the Wassserstein distance, and is based on a different\nformula for CVaR. We therefore believe that the independent proofs given for the results of Section 3\nare interesting in their own right.\n\n5 CPT-value estimation\n\nFor any r.v. X, the CPT-value is de\ufb01ned as\n\n(cid:90) \u221e\n\nw+(cid:0)P(cid:0)u+(X) > z(cid:1)(cid:1) dz \u2212\n\n(cid:90) \u221e\n\nw\u2212(cid:0)P(cid:0)u\u2212(X) > z(cid:1)(cid:1) dz,\n\n0\n\n0\n\nC(X) =\n\n(11)\nLet us deconstruct the above de\ufb01nition. The functions u+, u\u2212 : R \u2192 R+ are utility functions that are\nassumed to be continuous, with u+(x) = 0 when x \u2264 0 and increasing otherwise, and with u\u2212(x) =\n0 when x \u2265 0 and decreasing otherwise. The utility functions capture the human inclination to play\nsafe with gains and take risks with losses \u2013 see Fig 1. Second, w+, w\u2212 : [0, 1] \u2192 [0, 1] are weight\nfunctions, which are assumed to be continuous, non-decreasing and satisfy w+(0) = w\u2212(0) = 0\nand w+(1) = w\u2212(1) = 1. The weight functions w+, w\u2212 capture the human inclination to view\nprobabilities in a non-linear fashion. Tversky and Kahneman [1992], Barberis [2013] (see Fig 2\nfrom Tversky and Kahneman [1992]) recommend the following choices for w+ and w\u2212, based on\ninference from experiments involving human subjects:\n\nw+(p) =\n\np0.61\n\n(p0.61 + (1 \u2212 p)0.61)\n\n, and w\u2212(p) =\n\n1\n\n0.61\n\np0.69\n\n(p0.69 + (1 \u2212 p)0.69)\n\n.\n\n1\n\n0.69\n\n(12)\n\nUtility\n\nLosses\n\nGains\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n)\np\n(\nw\n\nt\nh\ng\ni\ne\n\nW\n\nu+\n\n\u2212u\u2212\n\nFigure 1: Utility function\n\n0\n\n0\n\n0.2\n\n0.4\n0.6\nProbability p\n\n0.8\n\n1\n\nw(p) =\n\np0.61\n\n(p0.61+(1\u2212p)0.61)1/0.61\n\nw(p) = p\n\nFigure 2: Weight function\n\nWe now recall CPT-value estimation proposed in [Prashanth et al., 2016]. Let Xi, i = 1, . . . , n\ndenote n samples from the distribution of X. The EDF for u+(X) and u\u2212(X), for any given real-\nvalued functions u+ and u\u2212, is de\ufb01ned as follows: \u02c6F +\nn (x) =\n\nI{(u+(Xi) \u2264 x)} , \u02c6F \u2212\n\nI{(u\u2212(Xi) \u2264 x)} . Using EDFs, the CPT-value is estimated as follows:\n\nn (x) = 1\nn\n\n(cid:80)n\n\ni=1\n\n1\nn\n\ni=1\n\n(cid:80)n\n\nw\u2212(1 \u2212 \u02c6F \u2212\n\nn (x))dx.\n\n(13)\n\n(cid:90) \u221e\n\n0\n\nCn =\n\n(cid:90) \u221e\n\n0\n\nw+(1 \u2212 \u02c6F +\n\nn (x))dx \u2212\n\n6\n\n\f(cid:16)\n\n(cid:17)\n\n(cid:16)\n\n(cid:17)\n\n1 \u2212 \u02c6F +\n\n1 \u2212 \u02c6F \u2212\n\nand\n\nn (x)\n\nn (x)\n\nfor\nNotice that we have substituted the complementary EDFs\nP (u+(X) > x) and P (u\u2212(X) > x), respectively, in (11), and then performed an integration of the\nweight function composed with the complementary EDF. As shown in Section III of [Prashanth\net al., 2016], the \ufb01rst and second integral in (13) can be easily computed using the order statistics\n{X(1), . . . , X(n)}.\nFor the purpose of analysis, as in [Cheng et al., 2018], we make the following assumption:\n(C3) The weight functions w+, w\u2212 are L-H\u00f6lder continuous of order \u03b1 \u2208 (0, 1) for some constant\nL > 0.\nIn this paper, we are interested in deriving a concentration bound for the estimator in (13). To put\nthings in context, in [Cheng et al., 2018], the authors derive a concentration bound assuming that\nthe underlying distribution has bounded support, and for this purpose, they employ the Dvoretzky-\nKiefer-Wolfowitz (DKW) theorem (cf. Chapter 2 of [Wasserman, 2015]). Interestingly, we are able to\nprovide a matching bound for the case of distributions with bounded support, using a proof technique\nthat relates the the estimation error |Cn \u2212 C(X)| to the Wasserstein distance between the empirical\nand true CDF, and this is the content of the proposition below (see Section 5 of Bhat and Prashanth\n[2019] for the proof).\nProposition 3. (CPT concentration for bounded r.v.s) Let X1, . . . , Xn be i.i.d. samples of a r.v.\nX that is bounded a.s. in [\u2212T1, T2], where T1, T2 \u2265 0, and at least one of T1, T2 is positive. Let\nT (cid:44) max{u+(T2), u\u2212(\u2212T1)}. Then, under (C3), we have\n\n(cid:18)\n\n(cid:104)\n\n(cid:105)1/\u03b1(cid:19)\n\nP (|Cn \u2212 C(X)| > \u0001) \u2264 2B\n\n\u0001\n\nn,\n\n2LT 1\u2212\u03b1\n\n, for any \u0001 \u2265 0,\n\nwhere B(\u00b7,\u00b7) is as given in i) of Lemma 1 with \u03b2 = 2.\nFrom the form for B(\u00b7,\u00b7) in Lemma 1, it is apparent that |Cn \u2212 C(X)| < \u0001 with probability 1 \u2212 \u03b4, if\n\nthe number of samples n is of the order O(cid:0)1/\u00012/\u03b1 log(cid:0) 1\n(cid:1)(cid:1), for any \u03b4 \u2208 (0, 1).\n(cid:90) \u03c4n\n\nNext, we provide a CPT concentration result for the case when the underlying r.v. is unbounded,\nbut sub-Gaussian. For this case, we consider a modi\ufb01ed CPT value estimator based on truncation,\nnamely,\n\n(cid:90) \u03c4n\n\n\u03b4\n\n\u02dcCn =\n\nw+(1 \u2212 \u02c6F +\n\nn (z))dz \u2212\n\nw\u2212(1 \u2212 \u02c6F \u2212\n\nn (z))dz,\n\n0\n\nlog n +\n\nwhere the sample-size-dependent truncation threshold \u03c4n is speci\ufb01ed in the result below. The proof is\navailable in Bhat and Prashanth [2019].\nProposition 4. (CPT concentration for sub-Gaussian r.v.s) Let X1, . . . , Xn be i.i.d. samples from\nthe distribution of X. Suppose that u+(X) and u\u2212(X) are sub-Gaussian r.v.s with parameter\nlog log n >\nmax (E(u+(X)), E(u\u2212(X))) + 1, we have\n\nlog log n(cid:1) for all n \u2265 1. Then, for all n satisfying \u03c3\n\u03c3. Set \u03c4n = \u03c3(cid:0)\u221a\n\u03b1\uf8f6\uf8f8 for every \u0001 >\n(cid:33) 2\nP(cid:16)(cid:12)(cid:12)(cid:12) \u02dcCn \u2212 C(X)\n(cid:12)(cid:12)(cid:12) > \u0001\n(cid:17) \u2264 2C exp\n(cid:19)\n\nis\n, and it is apparent that the bound we obtain is signi\ufb01cantly\n\ncorresponding bound provided in Proposition 3 of Cheng et\n\nwhere C, c are constants that depend on the sub-Gaussianity parameter \u03c3.\n\n\uf8eb\uf8ed\u2212cn\n\n\u0001 \u2212 8L\u03c32\n\u221a\n\u03b1n\u03b1/2\nL\nlog n\n\n2+\u03b1 + 2e\u2212n\n\n8L\u03c32\n\u03b1n\u03b1/2\n\n\u03b1\n2+\u03b1 ( \u0001\n\n[2018]\n\n(cid:32)\n\n2\n\u03b1\n\n2H )\n\n(cid:18)\n\n\u221a\n\nal.\n\n,\n\n0\n\n\u221a\n\n\u03b1\n\nThe\n2ne\u2212n\nimproved.\n\n6 CVaR-sensitive bandits\n\nThe concentration bound for CVaR estimation in Proposition 1 opens avenues for bandit applications.\nWe illustrate this claim by using the regret minimization framework in a stochastic K-armed bandit\nproblem, with an objective based on CVaR. While CVaR optimization has been considered in a bandit\n\n7\n\n\fsetting in the literature (cf. Galichet et al. [2013]), the underlying arms\u2019 distributions there have\nbounded support. We relax this assumption, and consider the case of sub-Gaussian distributions for\nthe K arms. The tail bounds in Kolla et al. [2019b] and Kolla et al. [2019a] do not allow a bandit\napplication, because forming the con\ufb01dence term (required for UCB-type algorithms) using their\nbound would require knowledge of the density in a neighborhood of the true VaR. In contrast, the\nconstants in our bounds depend only on the sub-Gaussian parameter \u03c3, and several classic MAB\nalgorithms (including UCB) assume this information.\nSuppose we are given K arms with unknown distributions Pi, i = 1, . . . , K. The interaction of\nthe bandit algorithm with the environment proceeds, over n rounds, as follows: (i) select an arm\nIt \u2208 {1, . . . , K}; (ii) Observe a sample cost from the distribution PIt corresponding to the arm It.\nLet C\u03b1(i) denote the CVaR, with con\ufb01dence \u03b1 \u2208 (0, 1), of the distribution Pi corresponding to arm i,\nfor i = 1, . . . , K. Let C\u2217 = mini=1,...,K C\u03b1(i) denote the lowest CVaR among the K distributions,\nand \u2206i = (C\u03b1(i) \u2212 C\u2217) denote the gap in CVaR values of arm i and that of the best arm.\nThe classic objective in a bandit problem is to \ufb01nd the arm with the lowest expected value. We\nconsider an alternative formulation, where the goal is to \ufb01nd the arm with the lowest CVaR. Using\nthe notion of regret, this objective is formalized as follows:\n\nRn =(cid:80)K\n\ni=1 C\u03b1(i)Ti(n) \u2212 nC\u2217 =(cid:80)K\n\ni=1 Ti(n)\u2206i,\n\nI{It = i} is the number of pulls of arm i up to time instant n.\n\nwhere Ti(n) =(cid:80)n\n\nt=1\n\nNext, we present a straightforward adaptation CVaR-LCB of the well-known UCB algorithm [Auer\net al., 2002] to handle an objective based on CVaR. The algorithm plays each arm once in the\ninitialization phase, and in each of the remaining rounds t = K + 1, . . . , n, plays the arm, say It,\nwith the lowest UCB value, that is, It = arg min\ni=1,...,K\n\nLCBt(i) with\n\n(cid:115)\n\nLCBt(i) = ci,Ti(t\u22121) \u2212 2\n1 \u2212 \u03b1\n\nlog (Ct)\nc Ti(t \u2212 1)\n\n,\n\nwhere ci,Ti(t\u22121) is the empirical CVaR for arm i computed using (6) from Ti(t \u2212 1) samples, and\nC, c are constants that depend on the sub-Gaussianity parameter \u03c3 (see Corollary 1).\nThe result below bounds the regret of CVaR-LCB algorithm, and the proof is a straightforward\nadaptation of that used to establish the regret bound of the regular UCB algorithm in [Auer et al.,\n2002] (see Bhat and Prashanth [2019] for details).\nTheorem 1. For a K-armed stochastic bandit problem where the the arms\u2019 distributions are sub-\nGaussian with parameter \u03c3 = 1, the regret Rn of CVaR-LCB satis\ufb01es\n\u03c02\n3\n\nE(Rn) \u2264 (cid:88)\n\n16 log(Cn)\n(1 \u2212 \u03b1)2\u2206i\n\n+ K\n\n\u2206i.\n\n1 +\n\n{i:\u2206i>0}\n\nFurther, Rn satis\ufb01es the following bound that does not scale inversely with the gaps:\n\n(cid:18)\n(cid:18) \u03c02\n\n3\n\n(cid:19)\n(cid:19)(cid:88)\n\ni\n\n(cid:112)Kn log(Cn) +\n\n+ 1\n\n\u2206i.\n\nE(Rn) \u2264\n\n8\n\n(1 \u2212 \u03b1)\n\n7 Conclusions\n\nWe used \ufb01nite sample bounds from Fournier and Guillin [2015] for the Wasserstein distance between\nthe empirical and true distributions of a random variable to derive two-sided concentration bounds\nfor the error between the true and empirical CVaR, spectral risk measure and CPT-value of a random\nvariable. Our bounds hold for random variables that either have \ufb01nite exponential moment, or \ufb01nite\nhigher-order moment, and specialize nicely to sub-Gaussian and sub-exponential random variables.\nThe bound further improves upon previous similar results, which either gave one-sided bounds, or\napplied only to bounded random variables. In addition, to illustrate the usefulness of our concentration\nbounds, we used our CVaR concentration bound to provide a regret-bound analysis for an algorithm\nfor a bandit problem where the risk measure to be optimized is CVaR.\n\n8\n\n\fReferences\nC. Acerbi. Spectral measures of risk: A coherent representation of subjective risk aversion. Journal\n\nof Banking and Finance, 26:1505 \u2013 1518, 2002.\n\nM. Allais. Le comportement de l\u2019homme rationel devant le risque: Critique des postulats et axioms\n\nde l\u2019ecole americaine. Econometrica, 21:503\u2013546, 1953.\n\nT. M. Apostol. Mathematical Analysis. Addison-Wesley, 2nd edition, 1974.\n\nP. Artzner, F. Delbaen, J. Eber, and D. Heath. Coherent measures of risk. Mathematical \ufb01nance, 9(3):\n\n203\u2013228, 1999.\n\nP. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem.\n\nMachine Learning, 47(2-3):235\u2013256, 2002.\n\nN. C. Barberis. Thirty years of prospect theory in economics: A review and assessment. Journal of\n\nEconomic Perspectives, pages 173\u2013196, 2013.\n\nS. P. Bhat and L. A. Prashanth. Concentration of risk measures: A Wasserstein distance approach.\n\narXiv preprint arXiv:1902.10709v2, 2019.\n\nD. B. Brown. Large deviations bounds for estimating conditional value-at-risk. Operations Research\n\nLetters, 35(6):722\u2013730, 2007.\n\nJ. Cheng, L. A. Prashanth, M. C. Fu, S. I. Marcus, and C. Szepesv\u00e1ri. Stochastic optimization in\na cumulative prospect theory framework. IEEE Transactions on Automatic Control, 2018. doi:\n10.1109/TAC.2018.2822658.\n\nK. Dowd and D. Blake. After VaR: The theory, estimation and insurance applications of quantile-\n\nbased risk measures. The Journal of Risk and Insurance, 73(2):193\u2013229, 2006.\n\nD. A. Edwards. On the Kantorovich\u2013Rubinstein Theorem. Expositiones Mathematicae, 29(4):\n\n387\u2013398, 2011.\n\nD. Ellsberg. Risk, ambiguity and the Savage\u2019s axioms. The Quarterly Journal of Economics, 75(4):\n\n643\u2013669, 1961.\n\nN. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical\n\nmeasure. Probability Theory and Related Fields, 162(3-4):707\u2013738, 2015.\n\nN. Galichet, M. Sebag, and O. Teytaud. Exploration vs exploitation vs safety: Risk-aware multi-armed\n\nbandits. In Asian Conference on Machine Learning, pages 245\u2013260, 2013.\n\nC. R. Givens and R. M. Shortt. A class of Wasserstein metrics for probability distributions. Michigan\n\nMathematical Journal, 31(2):231\u2013240, 1984.\n\nD. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica:\n\nJournal of the Econometric Society, pages 263\u2013291, 1979.\n\nR. K. Kolla, L. A. Prashanth, and K. P. Jagannathan. Risk-aware multi-armed bandits using conditional\n\nvalue-at-risk. CoRR, abs/1901.00997, 2019a.\n\nR. K. Kolla, L.A. Prashanth, S. P. Bhat, and K. Jagannathan. Concentration bounds for empirical\nconditional value-at-risk: The unbounded case. Operations Research Letters, 47(1):16 \u2013 20, 2019b.\n\nL. A. Prashanth, J. Cheng, M. C. Fu, S. I. Marcus, and C. Szepesv\u00e1ri. Cumulative prospect theory\nmeets reinforcement learning: prediction and control. In International Conference on Machine\nLearning, pages 1406\u20131415, 2016.\n\nL. A. Prashanth, K. Jagannathan, and R. K. Kolla. Concentration bounds for CVaR estimation: The\n\ncases of light-tailed and heavy-tailed distributions, 2019.\n\nR. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 2(3):\n\n21\u201341, 2000.\n\n9\n\n\fA. Tversky and D. Kahneman. Advances in prospect theory: Cumulative representation of uncertainty.\n\nJournal of Risk and Uncertainty, 5(4):297\u2013323, 1992.\n\nS. S. Vallander. Calculation of the Wasserstein distance between probability distributions on the line.\n\nTheory of Probability and its Applications, 18(4):784\u2013786, 1974.\n\nC. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.\n\nM. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in\n\nStatistical and Probabilistic Mathematics. Cambridge University Press, 2019.\n\nY. Wang and F. Gao. Deviation inequalities for an estimator of the conditional value-at-risk. Opera-\n\ntions Research Letters, 38(3):236\u2013239, 2010.\n\nL. A. Wasserman. All of Nonparametric Statistics. Springer, 2015.\n\n10\n\n\f", "award": [], "sourceid": 6260, "authors": [{"given_name": "Sanjay P.", "family_name": "Bhat", "institution": "Tata Consultancy Services Limited"}, {"given_name": "Prashanth", "family_name": "L.A.", "institution": "IIT Madras"}]}