{"title": "Capacity Bounded Differential Privacy", "book": "Advances in Neural Information Processing Systems", "page_first": 3474, "page_last": 3483, "abstract": "Differential privacy, a notion of algorithmic stability, is a gold standard for \nmeasuring the additional risk an algorithm's output poses to the privacy of a\nsingle record in the dataset. Differential privacy is defined as the distance\nbetween the output distribution of an algorithm on neighboring datasets that\ndiffer in one entry. In this work, we present a novel relaxation of differential\nprivacy, capacity bounded differential privacy, where the adversary\nthat distinguishes output distributions is assumed to be\ncapacity-bounded -- i.e. bounded not in computational power, but in\nterms of the function class from which their attack algorithm is drawn. We model\nadversaries in terms of restricted f-divergences between probability\ndistributions, and study properties of the definition and algorithms that\nsatisfy them.", "full_text": "Capacity Bounded Differential Privacy\n\nKamalika Chaudhuri\n\nUC San Diego\n\nJacob Imola\nUC San Diego\n\nAshwin Machanavajjhala\n\nDuke University\n\nkamalika@cs.ucsd.edu\n\njimola@eng.ucsd.edu\n\nashwin@cs.duke.edu\n\nAbstract\n\nDifferential privacy has emerged as the gold standard for measuring the risk posed\nby an algorithm\u2019s output to the privacy of a single individual in a dataset. It is\nde\ufb01ned as the worst-case distance between the output distributions of an algorithm\nthat is run on inputs that differ by a single person. In this work, we present a novel\nrelaxation of differential privacy, capacity bounded differential privacy, where the\nadversary that distinguishes the output distributions is assumed to be capacity-\nbounded \u2013 i.e. bounded not in computational power, but in terms of the function\nclass from which their attack algorithm is drawn. We model adversaries of this\nform using restricted f-divergences between probability distributions, and study\nproperties of the de\ufb01nition and algorithms that satisfy them. Our results demon-\nstrate that these de\ufb01nitions possess a number of interesting properties enjoyed by\ndifferential privacy and some of its existing relaxations; additionally, common\nmechanisms such as the Laplace and Gaussian mechanisms enjoy better privacy\nguarantees for the same added noise under these de\ufb01nitions.\n\n1\n\nIntroduction\n\nDifferential privacy [8] has emerged as a gold standard for measuring the privacy risk posed by\nalgorithms analyzing sensitive data. A randomized algorithm satis\ufb01es differential privacy if an\narbitrarily powerful attacker is unable to distinguish between the output distributions of the algorithm\nwhen the inputs are two datasets that differ in the private value of a single person. This provides a\nguarantee that the additional disclosure risk to a single person in the data posed by a differentially\nprivate algorithm is limited, even if the attacker has access to side information. However, a body\nof prior work [28, 3, 17, 1] has shown that this strong privacy guarantee comes at a cost: for many\nmachine-learning tasks, differentially private algorithms require a much higher number of samples to\nacheive the same amount of accuracy than is needed without privacy.\nPrior work has considered relaxing differential privacy in a number of different ways. Puffer\ufb01sh [16]\nand Blow\ufb01sh [12] generalize differential privacy by restricting the properties of an individual that\nshould not be inferred by the attacker, as well as explicitly enumerating the side information available\nto the adversary. Renyi- and KL-differential privacy [23, 31] measure privacy loss as the \u03b1-Renyi\nand KL-divergence between the output distributions (respectively). The original differential privacy\nde\ufb01nition measures privacy as a max-divergence (or \u03b1-Renyi, with \u03b1 \u2192 \u221e). Computational\ndifferential privacy (CDP) [24] considers a computationally bounded attacker, and aims to ensure that\nthe output distributions are computationally indistinguishable. These three approaches are orthogonal\nto one another as they generalize or relax different aspects of the privacy de\ufb01nition.\nIn this paper, we consider an novel approach to relaxing differential privacy by restricting the\nadversary to \u201cattack\" or post-process the output of a private algorithm using functions drawn from a\nrestricted function class and show how to quantitatively calculate privacy losses against particular\nfunction classes. These adversaries, that we call capacity bounded, can be used to model two kinds of\napplication scenarios. The \ufb01rst is where the attacker is machine learnt and lies in some known space\nof functions (e.g., all linear functions, linear classi\ufb01ers, etc.). The second is a user under a data-usage\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fcontract that restricts how the output of a private algorithm can be used. If the contract stipulates\nthat the user can only compute a certain class of functions on the output, then a privacy guarantee\nof this form ensures that no privacy violation can occur if users obey their contracts. By showing\nhow to quatify privacy loss in these settings allows (a) better decisions in cases where we expect the\nadversaries to be bounded in what they can do \u2013 for example, automated adversaries or adversaries\nunder a data-usage contract \u2013 and (b) better design of data-usage contracts. Unlike computational\nDP, where computationally bounded adversaries do not meaningfully relax the privacy de\ufb01nition in\nthe typical centralized differential privacy model [11], we believe that capacity bounded adversaries\nwill relax the de\ufb01nition to permit more useful algorithms and are a natural and interesting class of\nadversaries.\nThe \ufb01rst challenge is how to model these adversaries. We begin by showing that privacy with capacity\nbounded adversaries can be cleanly modeled through the restricted divergences framework [21, 20, 26]\nthat has been recently used to build a theory for generative adversarial networks. This gives us a\nnotion of (H, \u0393)-capacity bounded differential privacy, where the privacy loss is measured in terms\nof a divergence \u0393 (e.g., Renyi) between output distributions of a mechanism on datasets that differ by\na single person restricted to functions in H (e.g., lin, the space of all linear functions).\nWe next investigate properties of these privacy de\ufb01nitions, and show that they enjoy many of the\ngood properties enjoyed by differential privacy and its relaxations \u2013 convexity, graceful composition,\nas well as post-processing invariance to certain classes of functions. We analyze well-known privacy\nmechanisms, such as the Laplace and the Gaussian mechanism under (lin, KL) and (lin, Renyi)\ncapacity bounded privacy \u2013 where the adversaries are the class of all linear functions. We show that\nrestricting the capacity of the adversary does provide improvements in the privacy guarantee in many\ncases. We then use this to demonstrate that the popular Matrix Mechanism [18, 19, 22] gives an\nimprovement in the privacy guarantees when considered under capacity bounded de\ufb01nition.\nWe conclude by showing some preliminary results that indicate that the capacity bounded de\ufb01nitions\nsatisfy a form of algorithmic generalization. Speci\ufb01cally, for every class of queries Q, there exists a\n(non-trivial) H such that an algorithm that answers queries in the class Q and is (H, KL)-capacity\n\u221a\nbounded private with parameter \u0001 also ensures generalization with parameter O(\nThe main technical challenge we face is that little is known about properties of restricted divergences.\nWhile unrestricted divergences such as KL and Renyi are now well-understood as a result of more\nthan \ufb01fty years of research in information theory, these restricted divergences are only beginning to\nbe studied in their own right. A side-effect of our work is that we advance the information geometry\nof these divergences, by establishing properties such as versions of Pinsker\u2019s Inequality and the Data\nProcessing Inequality. We believe that these will be of independent interest to the community and aid\nthe development of the theory of GANs, where these divergences are also used.\n\n\u0001).\n\n2 Preliminaries\n\n2.1 Privacy\n\n\u03b1\u22121 log(cid:0)(cid:82)\n\nx P (x)\u03b1Q(x)1\u2212\u03b1dx(cid:1) .\n\nLet D be a dataset, where each data point represents a single person\u2019s value. A randomized algorithm\nA satis\ufb01es differential privacy [8] if its output is insensitive to adding or removing a data point\nto its input D. We can de\ufb01ne this privacy notion in terms of the Renyi Divergence of two output\ndistributions: A(D) \u2013 the distribution of outputs generated by A with input D, and A(D(cid:48)), the\ndistrbution of outputs generated by A with input D(cid:48), where D and D(cid:48) differ by a single person\u2019s\nvalue [23]. Here, recall that the Renyi divergence of order \u03b1 between distributions P and Q can be\nwritten as: DR,\u03b1(P, Q) = 1\nDe\ufb01nition 1 (Renyi Differential Privacy). A randomized algorithm A that operates on a dataset D\nis said to provide (\u03b1, \u0001)-Renyi differential privacy if for all D and D(cid:48) that differ by a single person\u2019s\nvalue, we have: DR,\u03b1(A(D), A(D(cid:48))) \u2264 \u0001.\nWhen the order of the divergence \u03b1 \u2192 \u221e, we require the max-divergence of the two distrbutions\nbounded by \u0001 \u2013 which is standard differential privacy [7]. When \u03b1 \u2192 1, DR,\u03b1 becomes the Kullback-\nLiebler (KL) divergence, and we get KL differential privacy [32].\n\n2\n\n\f2.2 Divergences and their Variational Forms\n\nA popular class of divergences is Czisar\u2019s f-divergences [5], de\ufb01ned as follows.\nDe\ufb01nition 2. Let f be a lower semi-continuous convex function such that f (1) = 0, and let P\nand Q be two distributions over a probability space (\u2126, \u03a3) such that P is absolutely continuous\nwith respect to Q. Then, the f-divergence between P and Q, denoted by Df (P, Q) is de\ufb01ned as:\n\nDf (P, Q) =(cid:82)\n\n(cid:17)\n\n(cid:16) dP\n\ndQ\n\n\u2126 f\n\ndQ.\n\n2|t \u2212 1|) and \u03b1-divergence (f (t) = (|t|\u03b1 \u2212 1)/(\u03b12 \u2212 \u03b1)).\n\nExamples of f-divergences include the KL divergence (f (t) = t log t), the total variation distance\n(f (t) = 1\nGiven a function f with domain R, we use f\u2217 to denote its Fenchel conjugate: f\u2217(s) = supx\u2208R x \u00b7\ns \u2212 f (x). [25] shows that f-divergences have a dual variational form:\n\nDf (P, Q) = sup\nh\u2208F\n\nEx\u223cP [h(x)] \u2212 Ex\u223cQ[f\u2217(h(x))],\n\n(1)\n\nwhere F is the set of all functions over the domain of P and Q.\nRestricted Divergences. Given an f-divergence and a class of functions H \u2286 F, we can de\ufb01ne a\nnotion of a H-restricted f-divergence by selecting, instead of F, the more restricted class of functions\nH, to maximize over in (1):\nDH\nf (P, Q) = sup\nh\u2208H\n\nEx\u223cP [h(x)] \u2212 Ex\u223cQ[f\u2217(h(x))],\n\n(2)\n\nThese restricted divergences have previously been considered in the context of, for example, Genera-\ntive Adversarial Networks [26, 2, 20, 21].\nWhile Renyi divergences are not f-divergences in general, we can also de\ufb01ne restricted versions\nfor them by going through the corresponding \u03b1-divergence \u2013 which, recall, is an f-divergence with\nf (t) = (|t|\u03b1 \u2212 1)/(\u03b12 \u2212 \u03b1), and is related to the Renyi divergence by a closed form equation [4].\nGiven a function class H, an order \u03b1, and two probability distributions P and Q, we can de\ufb01ne the\nH-restricted Renyi divergence of order \u03b1 using the same closed form equation on the H-restricted\n\u03b1-divergence as follows:\nDH\n\nR,\u03b1(P, Q) =(cid:0)log(cid:0)1 + \u03b1(\u03b1 \u2212 1)DH\n\n\u03b1 (P, Q)(cid:1)(cid:1) /(\u03b1 \u2212 1)\n\n(3)\n\nwhere DH\n\n\u03b1 is the corresponding H-restricted \u03b1-divergence.\n\n3 Capacity Bounded Differential Privacy\nThe existence of H-restricted divergences suggests a natural notion of privacy \u2013 when the adversary\nlies in a (restricted) function class H, we can, instead of F, consider the class H of functions in the\nsupremum. This enforces that no adversary in the function class H can distinguish between A(D)\nand A(D(cid:48)) beyond \u0001. We call these capacity bounded adversaries.\nDe\ufb01nition 3 ((H, \u0393)-Capacity Bounded Differential Privacy). Let H be a class of functions with\ndomain X , and \u0393 be a divergence. A mechanism A is said to offer (H, \u0393)-capacity bounded privacy\nwith parameter \u0001 if for any two D and D(cid:48) that differ by a single person\u2019s value, the H-restricted\n\u0393-divergence between A(D) and A(D(cid:48)) is at most \u0001:\n\n\u0393H(A(D), A(D(cid:48))) \u2264 \u0001\n\nWhen H is the class of all functions, and \u0393 is a Renyi divergence, the de\ufb01nition reduces to Renyi\nDifferential privacy; capacity bounded privacy is thus a generalization of Renyi differential privacy.\n\nFunction Classes. The de\ufb01nition of capacity bounded privacy allows for an in\ufb01nite number of\nvariations corresponding to the class of adversaries H.\nAn example of such a class is all linear adversaries over a feature space \u03c6, which includes all linear\nregressors over \u03c6. A second example is the class of all functions in an Reproducible Kernel Hilbert\nSpace; these correspond to all kernel classi\ufb01ers. A third interesting class is linear combinations of\nall Relu functions; this correspond to all two layer neural networks. These function classes would\ncapture typical machine learnt adversaries, and designing mechanisms that satisfy capacity bounded\nDP with respect to these functions classes is an interesting research direction.\n\n3\n\n\f4 Properties\n\nThe success of differential privacy has been attributed its highly desirable properties that make it\namenable for practical use. In particular, [15] proposes that any privacy de\ufb01nition should have three\nproperties \u2013 convexity, post-processing invariance and graceful composition \u2013 all of which apply to\ndifferential privacy. We now show that many of these properties continue to hold for the capacity\nbounded de\ufb01nitions. The proofs appear in the Appendix.\n\nPost-processing. Most notions of differential privacy satisfy post-processing invariance, which\nstates that applying any function to the output of a private mechanism does not degrade the privacy\nguarantee. We cannot expect post-processing invariance to hold with respect to all functions for\ncapacity bounded privacy \u2013 otherwise, the de\ufb01nition would be equivalent to privacy for all adversaries!\nHowever, we can show that for any H and for any \u0393, (H, \u0393)-capacity bounded differential privacy is\npreserved after post-processing if certain conditions about the function classes hold:\nTheorem 1. Let \u0393 be an f-divergence or the Renyi divergence of order \u03b1 > 1, and let H G, and I be\nfunction classes such that for any g \u2208 G and i \u2208 I, i\u25e6g \u2208 H. If algorithm A satis\ufb01es (H, \u0393)-capacity\nbounded privacy with parameter \u0001, then, for any g \u2208 G, g \u25e6 A satis\ufb01es (I, \u0393)-capacity bounded\nprivacy with parameter \u0001.\nSpeci\ufb01cally, if I = H, then A is post-processing invariant. Theorem 1 is essentially a form of the\npopular Data Processing Inequality applied to restricted divergences; its proof is in the Appendix\nand follows from the de\ufb01nition as well as algebra. An example of function classes G,H, and I that\nsatisfy this conditions is when G,H,I are linear functions, where G : Rs \u2192 Rd, H : Rs \u2192 R, and\nI : Rd \u2192 R.\n\nConvexity. A second property is convexity [14], which states that if A and B are private mecha-\nnisms with privacy parameter \u0001 then so is a composite mechanism M that tosses a (data-independent)\ncoin and chooses to run A with probability p and B with probability 1 \u2212 p. We show that convexity\nholds for (H, \u0393)-capacity bounded privacy for any H and any f-divergence \u0393.\nTheorem 2. Let \u0393 be an f-divergence and A and B be two mechanisms which have the same range\nand provide (H, \u0393)-capacity bounded privacy with parameter \u0001. Let M be a mechanism which tosses\nan independent coin, and then executes mechanism A with probability \u03bb and B with probability\n1 \u2212 \u03bb. Then, M satis\ufb01es (H, \u0393)-capacity bounded privacy with parameter \u0001.\nWe remark that while differential privacy and KL differential privacy satisfy convexity, (standard)\nRenyi differential privacy does not; it is not surprising that neither does its capacity bounded version.\nThe proof uses convexity of the function f in an f-divergence.\n\nComposition. Broadly speaking, composition refers to how privacy properties of algorithms applied\nmultiple times relate to privacy properties of the individual algorithms. Two styles of composition are\nusually considered \u2013 sequential and parallel.\nA privacy de\ufb01nition is said to satisfy parallel composition if the privacy loss obtained by applying\nmultiple algorithms on disjoint datasets is the maximum of the privacy losses of the individual\nalgorithms. In particular, Renyi differential privacy of any order satis\ufb01es parallel composition. We\nshow below that so does capacity bounded privacy.\nTheorem 3. Let H1,H2 be two function classes that are convex and translation invariant. Let H be\nthe function class:\n\nH = {h1 + h2|h1 \u2208 H1, h2 \u2208 H2}\n\nand let \u0393 be the KL divergence or the Renyi divergence of order \u03b1 > 1. If mechanisms A and B\nsatisfy (H1, \u0393) and (H2, \u0393) capacity bounded privacy with parameters \u00011 and \u00012 respectively, and\nif the datasets D1 and D2 are disjoint, then the combined release (A(D1), B(D2)) satis\ufb01es (H, \u0393)\ncapacity bounded privacy with parameter max(\u00011, \u00012).\n\nIn contrast, a privacy de\ufb01nition is said to compose sequentially if the privacy properties of algorithms\nthat satisfy it degrade gracefully as the same dataset is used in multiple private releases. In particular,\nRenyi differential privacy is said to satisfy sequential additive composition \u2013 if multiple private\nalgorithms are used on the same dataset, then their privacy parameters add up. We show below that\n\n4\n\n\fDivergence Mechanism\n\nKL\nKL\n\u03b1-Renyi\n\u03b1-Renyi\n\u03b1-Renyi\n\n\u03b1-Renyi\n\nLaplace\nGaussian\n1/2\u03c32\n\u2264 1\nLaplace\n\u2264 1\nGaussian\n\u2264 1\nLaplace, d-dim\nGaussian, d-dim \u2264 1\n\n1+\u00012\u22121)2\n\n\u00012\n\n\u221a\n1 \u2212 (\n\nPrivacy Parameter, Linear Adversary\n\u221a\n1 + \u00012 \u2212 1 + log\n\n(cid:16)\n\u03b1\u22121 log(1 + 2\u03b1\u22121\u0001\u03b1)\n\u03b1\u22121 log(1 +\n\u03b1\u22121 log(1 + 2d(\u03b1\u22121)(\u0001(cid:107)v(cid:107)\u03b1)\u03b1)\n\u03b1\u22121 log\n\n\u221a\n2\u03c0\u03b1\u22121/\u03c3\u03b1)\n\u221a\n\n2d(\u03b1\u22121)\n\n(cid:18)\n\n\u03b1\u22121(cid:107)v(cid:107)\u03b1\n\n\u03b1\n\n\u03c0/2\n\u03c3\u03b1\n\n1 +\n\n(cid:17)\n\n(cid:19)\n\nPrivacy Parameter, Unre-\nstricted\n\u0001 \u2212 1 + e\u2212\u0001\n1/2\u03c32\n\u2265 \u0001 \u2212 log(2)/\u03b1\u22121\n\u03b1/2\u03c32\n\u2265 \u0001(cid:107)v(cid:107)1 \u2212 d log(2)/\u03b1\u22121\n\u03b1(cid:107)v(cid:107)2\n2\u03c32\n\n2\n\nTable 1: Privacy parameters of different mechanisms and divergences with a linear adversary and\nunrestricted. Proofs appear in the Appendix.\n\na similar result can be shown for (H, \u0393)-capacity bounded privacy when \u0393 is the KL or the Renyi\ndivergence, and H satis\ufb01es some mild conditions.\nTheorem 4. Let H1 and H2 be two function classes that are convex, translation invariant, and that\nincludes a constant function. Let H be the function class:\n\nH = {h1 + h2|h1 \u2208 H1, h2 \u2208 H2}\n\nand let \u0393 be the KL divergence or the Renyi divergence of order \u03b1 > 1. If mechanisms A and B\nsatisfy (H1, \u0393) and (H2, \u0393) capacity bounded privacy with parameters \u00011 and \u00012 respectively, then\nthe combined release (A, B) satis\ufb01es (H, \u0393) capacity bounded privacy with parameter \u00011 + \u00012.\nThe proof relies heavily on the relationship between the restricted and unrestricted divergences, as\nshown in [21, 20, 9], and is provided in the Appendix. Observe that the conditions on H1 and H2 are\nrather mild, and include a large number of interesting functions. One such example of H is the set of\nReLU neural networks with linear output node, a common choice when performing neural network\nregression.\nThe composition guarantees offered by Theorem 4 are non-adaptive \u2013 the mechanisms A and B\nare known in advance, and B is not chosen as a function of the output of A. Whether fully general\nadaptive composition is possible for the capacity bounded de\ufb01nitions is left as an open question for\nfuture work.\n\n5 Privacy Mechanisms\n\nThe de\ufb01nition of capacity bounded privacy allows for an in\ufb01nite number of variations, corresponding\nto the class of adversaries H and divergences \u0393, exploring all of which is outside the scope of a single\npaper. For the sake of concreteness, we consider linear and (low-degree) polynomial adversaries for\nH and KL as well as Renyi divergences of order \u03b1 for \u03b3. These correspond to cases where a linear or\na low-degree polynomial function is used by an adversary to attack privacy.\nA \ufb01rst sanity check is to see what kind of linear or polynomial guarantee is offered by a mechanism\nthat directly releases a non-private value (without any added randomness). This mechanism offers no\n\ufb01nite linear KL or Renyi differential privacy parameter \u2013 which is to be expected from any sensible\nprivacy de\ufb01nition (see the Appendix).\nWe now look at the capacity bounded privacy properties of the familiar Laplace and Gaussian\nmechanisms which form the building blocks for much of differential privacy. Bounds we wish to\ncompare appear in Table 1.\n\nLaplace Mechanism. The Laplace mechanism adds Lap(0, 1/\u0001) noise to a function with global\nsensitivity 1. In d dimensions, the mechanism adds d i.i.d. samples from Lap(0, 1/\u0001) to a function\nwith L1 sensitivity 1. More generally, we consider functions whose global sensitivity along coordinate\ni is vi. We let v = (v1, v2, . . . , vd).\nTable 1 shows (lin, KL)-capacity bounded privacy and KL-DP parameters for the Laplace mecha-\nnism. The former has a slightly smaller parameter than the latter.\n\n5\n\n\f(a) Plots of (lin, Renyi) capacity bounded DP and\nRenyi-DP parameters for Laplace mechanism when\n\u0001 = 1. For (lin, Renyi), the upper bound and exact\nvalue are shown.\n\n(b) Comparison of exact values of (poly, Renyi) ca-\npacity bounded DP parameters for Laplace mechanism\nwhen \u0001 = 1.\n\n(cid:18)(cid:18) 1\n\n(cid:19)\n\n(cid:18) 1\n\n2\n\n(cid:19)\n\n(cid:19)\n\nTable 1 also contains an upper bound on the (lin, Renyi) capacity bounded privacy, and a lower\nbound on the Renyi-DP. The exact value of the Renyi-DP is:\n\n1\n\n\u03b1 \u2212 1\n\nlog\n\n+\n\n1\n\n4\u03b1 \u2212 2\n\n2\n\ne(\u03b1\u22121)\u0001 +\n\n\u2212 1\n\n4\u03b1 \u2212 2\n\ne\u2212\u03b1\u0001\n\n(4)\n\nBy multiplying by \u03b1 \u2212 1 and exponentiating, we see that the (lin, Renyi) upper bound grows with\n1 + \u0001(2\u0001)\u03b1\u22121, while the Renyi-DP lower bound grows with (e\u0001)\u03b1\u22121. This means no matter what \u0001 is,\na moderately-sized \u03b1 will make the (lin, Renyi) upper bound smaller than the Renyi lower bound.\nFigure 1a plots the (lin, Renyi) upper bound, (4), and the exact value of the (lin, Renyi) parameter,\nas functions of \u03b1 when \u0001 = 1. We see the exact (lin, Renyi) is always better than (4), although the\nupper bound may sometimes be worse. The upper bound overtakes the lower bound when \u03b1 \u2248 3.3.\nFor the multidimensional Laplace Mechanism, the story is the same. The (lin, Renyi) upper bound\ncan now be thought of as a function of \u0001(cid:107)v(cid:107)\u03b1, and the Renyi lower bound a function of \u0001(cid:107)v(cid:107)1.\nBecause (cid:107)v(cid:107)\u03b1 \u2264 (cid:107)v(cid:107)1, we can replace (cid:107)v(cid:107)\u03b1 with (cid:107)v(cid:107)1 in the (lin, Renyi) upper bound, and repeat\nthe analysis for the unidimensional case. Notice that our (lin, Renyi) upper bound is slightly better\nthan using composition d times on the unidimensional Laplace mechanism which would result in a\nmultiplicative factor of d.\nFigure 1b contains plots of the exact (poly, Renyi) paramters for degree 1,2, and 3 polynomials, as\nfunctions of \u03b1 when \u0001 = 1. As we expect, as the polynomial complexity increases, the (poly, Renyi)\nparameters converge to the Renyi-DP parameter. This also provides an explanation for the counterin-\ntuitive observation that the (poly, Renyi) parameters eventually decrease with \u03b1. The polynomial\nfunction classes are too simple to distinguish the two distributions for larger \u03b1, but their ability to do\nso increases as the polynomial complexity increases.\n\nGaussian Mechanism. The Gaussian mechanism adds N (0, \u03c32) noise to a function with global\nsensitivity 1. In d dimensions, the mechanism adds N (0, \u03c32Id) to a function with L2 sensitivity\n1. More generally, we consider functions whose global sensitivity along coordinate i is vi We let\nv = (v1, v2, . . . , vd).\nWhereas the (lin, KL) parameter for Laplace is a little better than the KL-DP parameter, Table 1\nshows the Gaussian mechanism has the same parameter. This is because if P and Q are two Gaussians\nwith equal variance, the function h that maximizes the variational formulation corresponding to the\nKL-divergence is a linear function.\nFor Renyi capacity bounded privacy, the observations we make are nearly identical to that of the\nLaplace Mechanism. The reader is referred to the Appendix for plots and speci\ufb01c details.\n\n6\n\n\fMatrix Mechanism.\n\nNow, we show how to use the bounds in Table 1 to obtain better capacity bounded parameters for\na composite mechanism often used in practice: the Matrix mechanism [18, 19, 22]. The Matrix\nmechanism is a very general method of computing linear queries on a dataset, usually with less error\nthan the Laplace Mechanism. Given a dataset D \u2208 \u03a3m over a \ufb01nite alphabet \u03a3 of size n, we can\nform a vector of counts x \u2208 Rn such that xi contains how many times i appears in D. A linear query\nis a vector w \u2208 Rn and has answer wT x. A set of d linear queries can then be given by a matrix\nW \u2208 Rd\u00d7n with the goal of computing W x privately.\nA naive way to do this is to use the Laplace Mechanism in d dimensions to release x and then multiply\nby W . The key insight is that, for any A \u2208 Rs\u00d7n of rank n, we can instead add noise to Ax and\nmultiply the result by W A\u2020. Here, A\u2020 denotes the pseudoinverse of A such that W A\u2020A = W .\nThe Laplace Mechanism arises as the special case A = I; however, more carefully chosen As may\nbe used to get privacy with less noise. This gives rise to the Matrix mechanism:\n\nMA(W, x, \u0001) = W A\u2020(Ax + (cid:107)A(cid:107)1Lapd(0, 1/\u0001))\n\nHere, (cid:107)A(cid:107)1 is the maximum L1-norm of any column of a. Prior work shows that this mechanism\nprovides differential privacy and suggest different methods for picking an A. Regardless of which\nA is chosen, , we are able to provide a capacity-bounded privacy parameter that is better than any\nknown Renyi-DP analysis has shown:\nTheorem 5 (Matrix Mechanism). Let x \u2208 Rn be a data vector, W \u2208 Rd\u00d7n be a query matrix, and\nA \u2208 Rs\u00d7n be a strategy matrix. Then, releasing MA(W, x, \u0001) offers (lin, Renyi) capacity bounded\nprivacy with parameter at most\n\n\u03b1\u22121 log(1 + 2s(\u03b1\u22121)\u0001\u03b1).\n\n1\n\nNote this is the same upper bound as the d-dimensional Laplace mechanism; indeed, the proof works\nby applying post-processing to the Laplace mechanism.\n\n6 Algorithmic Generalization\n\nOver\ufb01tting to input data has long been the curse of many statistical and machine learning methods;\nharmful effects of over\ufb01tting can range from poor performance at deployment time all the way up to\nlack of reproducibility in scienti\ufb01c research due to p-hacking [13]. Motivated by these concerns, a\nrecent line of work in machine learning investigates properties that algorithms and methods should\npossess so that they can automatically guarantee generalization [27, 10, 6, 29]. In this connection,\ndifferential privacy and many of its relaxations have been shown to be highly successful; it is known\nfor example, that if adaptive data analysis is done by a differentially private algorithm, then the results\nautomatically possess certain generalization guarantees.\nA natural question is whether these properties translate to the capacity bounded differential privacy,\nand if so, under what conditions. We next investigate this question, and show that capacity bounded\nprivacy does offer promise in this regard. A more detailed investigation is left for future work.\n\nProblem Setting. More speci\ufb01cally, the problem setting is as follows. [27, 10, 6, 29]. We are given\nas input a data set S = {x1, . . . , xn} drawn from an (unknown) underlying distribution D over an\ninstance space X , and a set of \u201cstatistical queries\u201d Q; each statistical query q \u2208 Q is a function\nq : X \u2192 [0, 1].\n(cid:80)n\nA data analyst M observes S, and then picks a query qS \u2208 Q based on her observation; we say\nthat M generalizes well if the query qS evaluated on S is close to qS evaluated on a fresh sample\ni=1 qS(xi)\u2212\nfrom D (on expectation); more formally, this happens when the generalization gap 1\nEx\u223cD[qS(x)] is low.\nn\nObserve that if the query was picked without an apriori look at the data S, then the problem would be\ntrivial and solved by a simple Chernoff bound. Thus bounding the generalization gap is challenging\nbecause the choice of qS depends on S, and the dif\ufb01culty lies in analyzing the behaviour of particular\nmethods that make this choice.\n\nOur Result. Prior work in generalization theory [27, 10, 6, 29] shows that if M possesses certain\nalgorithmic stability properties \u2013 such as differential privacy as well as many of its relaxations and\n\n7\n\n\fgeneralizations \u2013 then the gap is low. We next show that provided the adversarial function class H\nsatis\ufb01es certain properties with respect to the statistical query class Q, (H, lin)-capacity bounded\nprivacy also has good generalization properties.\nTheorem 6 (Algorithmic Generalization). Let S be a sample of size n drawn from an underlying\ndata distribution D over an instance space X , and let M be a (randomized) mechanism that takes as\ninput S, and outputs a query qS in a class Q. For any x \u2208 X , de\ufb01ne a function hx : Q \u2192 [0, 1] as:\nhx(q) = q(x), and let H be any class of functions that includes {hx|x \u2208 X}.\nIf the mechanism M satis\ufb01es (H, KL)-capacity bounded privacy with parameter \u0001, then, for every\n\u221a\ndistribution D, we have:\nWe remark that the result would not hold for arbitrary (H, KL)-capacity bounded privacy, and a\ncondition that connects H to Q appears to be necessary. However, for speci\ufb01c distributions D, fewer\nconditions may be needed.\nObserve also that Theorem 6 only provides a bound on expectation. Stronger guarantees \u2013 such as\nhigh probability bounds as well as adaptive generalization bounds \u2013 are also known in the adaptive\ndata analysis literature. While we believe similar bounds should be possible in our setting, proving\nthem requires a variery of information-theoretic properties of the corresponding divergences, which\nare currently not available for restricted divergences. We leave a deeper investigation for future work.\n\ni=1 qS(xi) \u2212 Ex\u223cD[qS(x)](cid:1)(cid:12)(cid:12)(cid:12) \u2264 8\n(cid:80)n\n\n(cid:12)(cid:12)(cid:12)ES\u223cD,M\n\n\u0001.\n\n(cid:0) 1\n\nn\n\nProof Ingredient: A Novel Pinsker-like Inequality. We remark that an ingredient in the proof of\nTheorem 6 is a novel Pinsker-like inequality for restricted KL divergences, which was previously\nunknown, and is presented below (Theorem 7). We believe that this theorem may be of independent\ninterest, and may \ufb01nd applications in the theory of generative adversarial networks, where restricted\ndivergences are also used.\nWe begin by de\ufb01ning an integral probability metric (IPM) [30] with respect to a function class H.\nDe\ufb01nition 4. Given a function class H, and any two distributions P and Q, the Integral Proba-\nbility Metric (IPM) with respect to H is de\ufb01ned as follows: IPMH(P, Q) = suph\u2208H |EP [h(x)] \u2212\nEQ[h(x)]|.\nExamples of IPMs include the total variation distance where H is the class of all functions with\nrange [0, 1], and the Wasserstein distance where H is the class of all 1-Lipschitz functions. With this\nde\ufb01nition in hand, we can now state our result.\nTheorem 7 (Pinsker-like Inequality for Restricted KL Divergences). Let H be a convex class of\nfunctions with range [\u22121, 1] that is translation invariant and closed under negation. Then, for any\nP and Q such that P is absolutely continuous with respect to Q, we have that: IPMH(P, Q) \u2264\n\n8 \u00b7(cid:113)KLH(P, Q).\ndistance T V (P, Q) \u2264(cid:112)2KL(P, Q); however, instead of connecting the total variation distance and\n\nThis theorem is an extended version of the Pinsker Inequality, which states that the total variation\n\nKL divergences, it connects IPMs and the corresponding restricted KL divergences.\n\n7 Conclusion\n\nWe initiate a study into capacity bounded differential privacy \u2013 a relaxation of differential privacy\nagainst adversaries in restricted function classes. We show how to model these adversaries cleanly\nthrough the recent framework of restricted divergences. We then show that the de\ufb01nition satis\ufb01es\nprivacy axioms, and permits mechanisms that have higher utility (for the same level of privacy) than\nregular KL or Renyi differential privacy when the adverary is limited to linear functions. Finally, we\nshow some preliminary results that indicate that these de\ufb01nitions offer good generalization guarantees.\nThere are many future directions. A deeper investigation into novel mechanisms that satisfy the\nde\ufb01nitions, particularly for other function classes such as threshold and relu functions remain open. A\nsecond question is a more detailed investigation into statistical generalization \u2013 such as generalization\nin high probability and adaptivity \u2013 induced by these notions. Finally, our work motivates a deeper\nexploration into the information geometry of adversarial divergences, which is of wider interest to\nthe community.\n\n8\n\n\fAcknowledgments.\nWe thank Shuang Liu and Arnab Kar for early discussions. KC and JI thank ONR under N00014-16-\n1-261, UC Lab Fees under LFR 18-548554 and NSF under 1253942 and 1804829 for support. AM\nwas supported by the National Science Foundation under grants 1253327, 1408982; and by DARPA\nand SPAWAR under contract N66001-15-C-4067.\n\nReferences\n[1] Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and\nLi Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference\non Computer and Communications Security, CCS \u201916, pages 308\u2013318, New York, NY, USA, 2016. ACM.\n\n[2] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in\n\ngenerative adversarial nets (gans). International Conference on Machine Learning, 2017.\n\n[3] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk\n\nminimization. J. Mach. Learn. Res., 12:1069\u20131109, July 2011.\n\n[4] Andrzej Cichocki and Shun-ichi Amari. Families of alpha- beta- and gamma- divergences: Flexible and\n\nrobust measures of similarities. Entropy, 12(6):1532\u20131568, 2010.\n\n[5] Imre Csisz\u00e1r. Information-type measures of difference of probability distributions and indirect observation.\n\nStudia Sci. Math. Hungary, 2:299\u2013318, 1967.\n\n[6] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. Preserv-\n\ning statistical validity in adaptive data analysis. CoRR, abs/1411.2664, 2014.\n\n[7] Cynthia Dwork, Frank Mcsherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in\nprivate data analysis. In In Proceedings of the 3rd Theory of Cryptography Conference, pages 265\u2013284.\nSpringer, 2006.\n\n[8] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor.\n\nComput. Sci., 2014.\n\n[9] Farzan Farnia and David Tse. A convex duality framework for gans. CoRR, abs/1810.11740, 2018.\n\n[10] Vitaly Feldman and Thomas Steinke. Generalization for adaptively-chosen estimators via stable median.\nIn Satyen Kale and Ohad Shamir, editors, Proceedings of the 30th Conference on Learning Theory, COLT\n2017, Amsterdam, The Netherlands, 7-10 July 2017, volume 65 of Proceedings of Machine Learning\nResearch, pages 728\u2013757. PMLR, 2017.\n\n[11] Adam Groce, Jonathan Katz, and Arkady Yerukhimovich. Limits of computational differential privacy in\nthe client/server setting. In Proceedings of the 8th Conference on Theory of Cryptography, TCC\u201911, pages\n417\u2013431, 2011.\n\n[12] Xi He, Ashwin Machanavajjhala, and Bolin Ding. Blow\ufb01sh privacy: Tuning privacy-utility trade-offs\n\nusing policies. CoRR, abs/1312.3913, 2013.\n\n[13] Megan L Head, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. The extent and\n\nconsequences of p-hacking in science. PLoS biology, 13(3):e1002106, 2015.\n\n[14] Daniel Kifer and Bing-Rong Lin. Towards an axiomatization of statistical privacy and utility. In Proceedings\nof the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems,\nPODS \u201910, pages 147\u2013158, New York, NY, USA, 2010. ACM.\n\n[15] Daniel Kifer and Ashwin Machanavajjhala. A rigorous and customizable framework for privacy. In\nProceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems,\nPODS \u201912, pages 77\u201388, New York, NY, USA, 2012. ACM.\n\n[16] Daniel Kifer and Ashwin Machanavajjhala. Puffer\ufb01sh: A framework for mathematical privacy de\ufb01nitions.\n\nACM Trans. Database Syst., 39(1):3, 2014.\n\n[17] Daniel Kifer, Adam D. Smith, and Abhradeep Thakurta. Private convex optimization for empirical risk\nminimization with applications to high-dimensional regression. In Shie Mannor, Nathan Srebro, and\nRobert C. Williamson, editors, COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27,\n2012, Edinburgh, Scotland, volume 23 of JMLR Proceedings, pages 25.1\u201325.40. JMLR.org, 2012.\n\n9\n\n\f[18] Chao Li, Michael Hay, Vibhor Rastogi, Gerome Miklau, and Andrew McGregor. Optimizing linear\ncounting queries under differential privacy. In Jan Paredaens and Dirk Van Gucht, editors, Proceedings of\nthe Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS\n2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 123\u2013134. ACM, 2010.\n\n[19] Chao Li and Gerome Miklau. Measuring the achievable error of query sets under differential privacy.\n\nCoRR, abs/1202.3399, 2012.\n\n[20] Shuang Liu, Olivier Bousquet, and Kamalika Chaudhuri. Approximation and convergence properties of\ngenerative adversarial learning. In Advances in Neural Information Processing Systems, pages 5545\u20135553,\n2017.\n\n[21] Shuang Liu and Kamalika Chaudhuri. The inductive bias of restricted f-gans.\n\narXiv:1809.04542, 2018.\n\narXiv preprint\n\n[22] Ryan McKenna, Gerome Miklau, Michael Hay, and Ashwin Machanavajjhala. Optimizing error of\n\nhigh-dimensional statistical queries under differential privacy. CoRR, abs/1808.03537, 2018.\n\n[23] Ilya Mironov. Renyi differential privacy. In CSF Synposium, 2017.\n\n[24] Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan. Computational differential privacy. In\nProceedings of the 29th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO\n\u201909, pages 126\u2013142, Berlin, Heidelberg, 2009. Springer-Verlag.\n\n[25] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence functionals and the\nlikelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847\u20135861,\n2010.\n\n[26] Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using\nvariational divergence minimization. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett,\neditors, Advances in Neural Information Processing Systems 29, pages 271\u2013279. Curran Associates, Inc.,\n2016.\n\n[27] D. Russo and J. Zou. How much does your data exploration over\ufb01t? Controlling bias via information\n\nusage. arXiv e-prints, November 2015.\n\n[28] A. D. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy:\nAlgorithms and challenges for continuous data. IEEE Signal Processing Magazine, 30(5):86\u201394, Sep.\n2013.\n\n[29] Adam D. Smith. Information, privacy and stability in adaptive data analysis. CoRR, abs/1706.00820, 2017.\n\n[30] Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Gert R. G. Lanckriet, and Bernhard Sch\u00f6lkopf.\n\nA note on integral probability metrics and $\\phi$-divergences. CoRR, abs/0901.2698, 2009.\n\n[31] Yu-Xiang. Wang, Jing Lei, and Stephen E. Fienberg. On-Average KL-Privacy and its equivalence to\n\nGeneralization for Max-Entropy Mechanisms. arXiv e-prints, May 2016.\n\n[32] Yu-Xiang Wang, Jing Lei, and Stephen E Fienberg. On-average kl-privacy and its equivalence to general-\nization for max-entropy mechanisms. In International Conference on Privacy in Statistical Databases,\npages 121\u2013134. Springer, 2016.\n\n10\n\n\f", "award": [], "sourceid": 1911, "authors": [{"given_name": "Kamalika", "family_name": "Chaudhuri", "institution": "UCSD"}, {"given_name": "Jacob", "family_name": "Imola", "institution": "UCSD"}, {"given_name": "Ashwin", "family_name": "Machanavajjhala", "institution": "Duke"}]}