{"title": "Extremal Mechanisms for Local Differential Privacy", "book": "Advances in Neural Information Processing Systems", "page_first": 2879, "page_last": 2887, "abstract": "Local differential privacy has recently surfaced as a strong measure of privacy in contexts where personal information remains private even from data analysts. Working in a setting where the data providers and data analysts want to maximize the utility of statistical inferences performed on the released data, we study the fundamental tradeoff between local differential privacy and information theoretic utility functions. We introduce a family of extremal privatization mechanisms, which we call staircase mechanisms, and prove that it contains the optimal privatization mechanism that maximizes utility. We further show that for all information theoretic utility functions studied in this paper, maximizing utility is equivalent to solving a linear program, the outcome of which is the optimal staircase mechanism. However, solving this linear program can be computationally expensive since it has a number of variables that is exponential in the data size. To account for this, we show that two simple staircase mechanisms, the binary and randomized response mechanisms, are universally optimal in the high and low privacy regimes, respectively, and well approximate the intermediate regime.", "full_text": "Extremal Mechanisms for Local Differential Privacy\n\nPeter Kairouz1\n\nSewoong Oh2\n\nPramod Viswanath1\n\n1Department of Electrical & Computer Engineering\n\n2Department of Industrial & Enterprise Systems Engineering\n\nUniversity of Illinois Urbana-Champaign\n\n{kairouz2,swoh,pramodv}@illinois.edu\n\nUrbana, IL 61801, USA\n\nAbstract\n\nLocal differential privacy has recently surfaced as a strong measure of privacy\nin contexts where personal information remains private even from data analysts.\nWorking in a setting where the data providers and data analysts want to maximize\nthe utility of statistical inferences performed on the released data, we study the\nfundamental tradeoff between local differential privacy and information theoretic\nutility functions. We introduce a family of extremal privatization mechanisms,\nwhich we call staircase mechanisms, and prove that it contains the optimal privati-\nzation mechanism that maximizes utility. We further show that for all information\ntheoretic utility functions studied in this paper, maximizing utility is equivalent\nto solving a linear program, the outcome of which is the optimal staircase mech-\nanism. However, solving this linear program can be computationally expensive\nsince it has a number of variables that is exponential in the data size. To account\nfor this, we show that two simple staircase mechanisms, the binary and random-\nized response mechanisms, are universally optimal in the high and low privacy\nregimes, respectively, and well approximate the intermediate regime.\n\n1\n\nIntroduction\n\nIn statistical analyses involving data from individuals, there is an increasing tension between the\nneed to share the data and the need to protect sensitive information about the individuals. For\nexample, users of social networking sites are increasingly cautious about their privacy, but still \ufb01nd\nit inevitable to agree to share their personal information in order to bene\ufb01t from customized services\nsuch as recommendations and personalized search [1, 2]. There is a certain utility in sharing data for\nboth data providers and data analysts, but at the same time, individuals want plausible deniability\nwhen it comes to sensitive information.\nFor such systems, there is a natural core optimization problem to be solved. Assuming both the\ndata providers and analysts want to maximize the utility of the released data, how can they do so\nwhile preserving the privacy of participating individuals? The formulation and study of an optimal\nframework addressing this tradeoff is the focus of this paper.\nLocal differential privacy. The need for data privacy appears in two different contexts: the local\nprivacy context, as in when individuals disclose their personal information (e.g., voluntarily on\nsocial network sites), and the global privacy context, as in when institutions release databases of\ninformation of several people or answer queries on such databases (e.g., US Government releases\ncensus data, companies like Net\ufb02ix release proprietary data for others to test state of the art data\nanalytics). In both contexts, privacy is achieved by randomizing the data before releasing it. We\nstudy the setting of local privacy, in which data providers do not trust the data collector (analyst).\nLocal privacy dates back to Warner [29], who proposed the randomized response method to provide\nplausible deniability for individuals responding to sensitive surveys.\n\n1\n\n\fA natural notion of privacy protection is making inference of information beyond what is released\nhard. Differential privacy has been proposed in the global privacy context to formally capture this\nnotion of privacy [11, 13, 12]. In a nutshell, differential privacy ensures that an adversary should\nnot be able to reliably infer whether or not a particular individual is participating in the database\nquery, even with unbounded computational power and access to every entry in the database except\nfor that particular individual\u2019s data. Recently, the notion of differential privacy has been extended\nto the local privacy context [10]. Formally, consider a setting where there are n data providers each\nowning a data Xi de\ufb01ned on an input alphabet X . In this paper, we shall deal, almost exclusively,\nwith \ufb01nite alphabets. The Xi\u2019s are independently sampled from some distribution P\u03bd parameterized\nby \u03bd \u2208 {0, 1}. A statistical privatization mechanism Qi is a conditional distribution that maps\nXi \u2208 X stochastically to Yi \u2208 Y, where Y is an output alphabet possibly larger than X . The\nYi\u2019s are referred to as the privatized (sanitized) views of Xi\u2019s. In a non-interactive setting where\nthe individuals do not communicate with each other and the Xi\u2019s are independent and identically\ndistributed, the same privatization mechanism Q is used by all individuals. For a non-negative \u03b5, we\nfollow the de\ufb01nition of [10] and say that a mechanism Q is \u03b5-locally differentially private if\n\nsup\n\nS\u2208\u03c3(Y),x,x(cid:48)\u2208X\n\nQ(S|Xi = x)\nQ(S|Xi = x(cid:48))\n\n\u2264 e\u03b5 ,\n\n(1)\n\nwhere \u03c3(Y) denotes an appropriate \u03c3-\ufb01eld on Y.\nInformation theoretic utilities for statistical analyses. The data analyst is interested in the statis-\ntics of the data as opposed to individual samples. Naturally, the utility should also be measured in\nterms of the distribution rather than sample quantities. Concretely, consider a client-server setting,\nwhere each client with data Xi sends a privatized version of the data Yi, via an \u03b5-locally differen-\ntially private privatization mechanism Q. Given the privatized views {Yi}n\ni=1, the data analyst wants\nto make inferences based on the induced marginal distribution\n\nM\u03bd(S) \u2261\n\nQ(S|x)dP\u03bd(x) ,\n\n(2)\nfor S \u2208 \u03c3(Y) and \u03bd \u2208 {0, 1}. The power to discriminate data generated from P0 to data generated\nfrom P1 depends on the \u2018distance\u2019 between the marginals M0 and M1. To measure the ability of\nsuch statistical discrimination, our choice of utility of a particular privatization mechanism Q is an\ninformation theoretic quantity called Csisz\u00b4ar\u2019s f-divergence de\ufb01ned as\n\n(cid:90)\n\nDf (M0||M1) =\n\nf\n\ndM1 ,\n\n(3)\n\n(cid:90)\n\n(cid:16) dM0\n\n(cid:17)\n\ndM1\n\nfor some convex function f such that f (1) = 0. The Kullback-Leibler (KL) divergence\nDkl(M0||M1) is a special case with f (x) = x log x, and so is the total variation (cid:107)M0 \u2212 M1(cid:107)TV\nwith f (x) = (1/2)|x \u2212 1|. Such f-divergences capture the quality of statistical inference, such as\nminimax rates of statistical estimation or error exponents in hypothesis testing [28]. As a motivating\nexample, suppose a data analyst wants to test whether the data is generated from P0 or P1 based on\nprivatized views Y1, . . . , Yn. According to Chernoff-Stein\u2019s lemma, for a bounded type I error prob-\nability, the best type II error probability scales as e\u2212n Dkl(M0||M1). Naturally, we are interested in\n\ufb01nding a privatization mechanism Q that minimizes the probability of error by solving the following\nconstraint maximization problem\n\nDkl(M0||M1) ,\n\nQ\u2208D\u03b5\n\nmaximize\n\n(4)\nwhere D\u03b5 is the set of all \u03b5-locally differentially private mechanisms satisfying (1). Motivated by\nsuch applications in statistical inference, our goal is to provide a general framework for \ufb01nding\noptimal privatization mechanisms that maximize the f-divergence between the induced marginals\nunder local differential privacy.\nContributions. We study the fundamental tradeoff between local differential privacy and f-\ndivergence utility functions. The privacy-utility tradeoff is posed as a constrained maximization\nproblem: maximize f-divergence utility functions subject to local differential privacy constraints.\nThis maximization problem is (a) nonlinear: f-divergences are convex in Q; (b) non-standard: we\nare maximizing instead of minimizing a convex function; and (c) in\ufb01nite dimensional: the space\nof all differentially private mechanisms is uncountable. We show, in Theorem 2.1, that for all f-\ndivergences, any \u03b5, and any pair of distributions P0 and P1, a \ufb01nite family of extremal mechanisms\n\n2\n\n\f(a subset of the corner points of the space of privatization mechanisms), which we call staircase\nmechanisms, contains the optimal privatization mechanism. We further prove, in Theorem 2.2, that\nsolving the original problem is equivalent to solving a linear program, the outcome of which is the\noptimal staircase mechanism. However, solving this linear program can be computationally expen-\nsive since it has 2|X| variables. To account for this, we show that two simple staircase mechanisms\n(the binary and randomized response mechanisms) are optimal in the high and low privacy regimes,\nrespectively, and well approximate the intermediate regime. This contributes an important progress\nin the differential privacy area, where the privatization mechanisms have been few and almost no ex-\nact optimality results are known. As an application, we show that the effective sample size reduces\nfrom n to \u03b52n under local differential privacy in the context of hypothesis testing.\nRelated work. Our work is closely related to the recent work of [10] where an upper bound on\nDkl(M0||M1) was derived under the same local differential privacy setting. Precisely, Duchi et. al.\nproved that the KL-divergence maximization problem in (4) is at most 4(e\u03b5 \u2212 1)2(cid:107)P1 \u2212 P2(cid:107)2\nT V .\nThis bound was further used to provide a minimax bound on statistical estimation using information\ntheoretic converse techniques such as Fano\u2019s and Le Cam\u2019s inequalities.\nIn a similar spirit, we are also interested in maximizing information theoretic quantities of the\nmarginals under local differential privacy. We generalize the results of [10], and provide stronger\nresults in the sense that we (a) consider a broader class of information theoretic utilities; (b) pro-\nvide explicit constructions of the optimal mechanisms; and (c) recover the existing result of [10,\nTheorem 1] (with a stronger condition on \u03b5).\nWhile there is a vast literature on differential privacy, exact optimality results are only known for a\nfew cases. The typical recipe is to propose a differentially private mechanism inspired by [11, 13,\n26, 20], and then establish its near-optimality by comparing the achievable utility to a converse, for\nexample in principal component analysis [8, 5, 19, 24], linear queries [21, 18], logistic regression [7]\nand histogram release [25]. In this paper, we take a different route and solve the utility maximization\nproblem exactly.\nOptimal differentially private mechanisms are known only in a few cases. Ghosh et. al. showed\nthat the geometric noise adding mechanism is optimal (under a Bayesian setting) for monotone\nutility functions under count queries (sensitivity one) [17]. This was generalized by Geng et. al.\n(for a worst-case input setting) who proposed a family of mechanisms and proved its optimality\nfor monotone utility functions under queries with arbitrary sensitivity [14, 16, 15]. The family of\noptimal mechanisms was called staircase mechanisms because for any y and any neighboring x and\nx(cid:48), the ratio of Q(y|x) to Q(y|x(cid:48)) takes one of three possible values e\u03b5, e\u2212\u03b5, or 1. Since the optimal\nmechanisms we develop also have an identical property, we retain the same nomenclature.\n\n2 Main results\n\nIn this section, we give a formal de\ufb01nition for staircase mechanisms and show that they are the\noptimal solutions to maximization problems of the form (5). Using the structure of staircase mech-\nanisms, we propose a combinatorial representation for this family of mechanisms. This allows us\nto reduce the nonlinear program of (5) to a linear program with 2|X| variables. Potentially, for\nany instance of the problem, one can solve this linear program to obtain the optimal privatization\nmechanism, albeit with signi\ufb01cant computational challenges since the number of variables scales\nexponentially in the alphabet size. To address this, we prove that two simple staircase mechanisms,\nwhich we call the binary mechanism and the randomized response mechanism, are optimal in high\nand low privacy regimes, respectively. We also show how our results can be used to derive upper\nbounds on f-divergences under privacy. Finally, we give a concrete example illustrating the exact\ntradeoff between privacy and statistical inference in the context of hypothesis testing.\n\n2.1 Optimality of staircase mechanisms\nConsider a random variable X \u2208 X generated according to P\u03bd, \u03bd \u2208 {0, 1}. The distribution of the\nprivatized output Y , whenever X is distributed according to P\u03bd, is represented by M\u03bd and given by\n(2). We are interested in characterizing the optimal solution of\n\nmaximize\n\nQ\u2208D\u03b5\n\nDf (M0||M1) ,\n\n(5)\n\n3\n\n\fwhere D\u03b5 is the set of all \u03b5-differentially private mechanisms satisfying, for all x, x(cid:48) \u2208 X and y \u2208 Y,\n\n0 \u2264 (cid:12)(cid:12)(cid:12) ln\n\n(cid:16) Q(y|x)\n\nQ(y|x(cid:48))\n\n(cid:17)(cid:12)(cid:12)(cid:12) \u2264 \u03b5 .\n\n(6)\n\nThis includes maximization over information theoretic quantities of interest in statistical estimation\nand hypothesis testing such as total variation, KL-divergence, and \u03c72-divergence [28]. In general\nthis is a complicated nonlinear program: we are maximizing a convex function in Q; further, the\ndimension of Q might be unbounded: the optimal privatization mechanism Q\u2217 might produce an\nin\ufb01nite output alphabet Y. The following theorem proves that one never needs an output alphabet\nlarger than the input alphabet in order to achieve the maximum divergence, and provides a combi-\nnatorial representation of the optimal solution.\nTheorem 2.1. For any \u03b5, any pair of distributions P0 and P1, and any f-divergence, there exists an\noptimal mechanism Q\u2217 maximizing the f-divergence in (5) over all \u03b5-locally differentially private\nmechanisms, such that\n\n(cid:16) Q\u2217(y|x)\n\n(cid:12)(cid:12)(cid:12) ln\n\n(cid:17)(cid:12)(cid:12)(cid:12) \u2208 {0, \u03b5} ,\n\n(7)\nfor all y \u2208 Y, x, x(cid:48) \u2208 X and the output alphabet size is at most equal to the input alphabet size:\n|Y| \u2264 |X|.\n\nQ\u2217(y|x(cid:48))\n\nThe optimal solution is an extremal mechanism, since the absolute value of the log-likelihood ratios\ncan only take one of the two extremal values (see Figure 1). We refer to such a mechanism as a\nstaircase mechanism, and de\ufb01ne the family of staircase mechanisms as\n\nS\u03b5 \u2261 {Q| satisfying (7)} .\n\nThis family includes all the optimal mechanisms (for all choices of \u03b5 \u2265 0, P0, P1 and f), and since\n(7) implies (6), staircase mechanisms are locally differentially private.\n\ny = 1\n\n2\n\ne\u03b5\n\n1+e\u03b5\n\n1\n\n1+e\u03b5\n\ne\u03b5\n\n3+e\u03b5\n\n1\n\n3+e\u03b5\n\ny = 1\n\n2\n\n3\n\n4\n\nx = 1\n\n2\n\n3\n\n4\n\n5\n\nx = 1\n\n2\n\n3\n\n4\n\nFigure 1: Examples of staircase mechanisms: the binary and randomized response mechanisms.\n\nFor global differential privacy, we can generalize the de\ufb01nition of staircase mechanisms to hold\nfor all neighboring database queries x, x(cid:48) (or equivalently within some sensitivity), and recover all\nknown existing optimal mechanisms. Precisely, the geometric mechanism shown to be optimal in\n[17], and the mechanisms shown to be optimal in [14, 16] (also called staircase mechanisms) are\nspecial cases of the staircase mechanisms de\ufb01ned above. We believe that the characterization of\nthese extremal mechanisms and the analysis techniques developed in this paper can be of indepen-\ndent interest to researchers interested in optimal mechanisms for global privacy and more general\nutilities.\nCombinatorial representation of the staircase mechanisms. Now that we know staircase mech-\nanisms are optimal, we can try to combinatorially search for the best staircase mechanism for any\n\ufb01xed \u03b5, P0, P1, and f. To this end, we give a simple representation of all staircase mechanisms,\nexploiting the fact that they are scaled copies of a \ufb01nite number of patterns.\nLet Q \u2208 R|X|\u00d7|Y| be a staircase mechanism and k = |X| denote the input alphabet size. Then,\nusing the de\ufb01nition of staircase mechanisms, Q(y|x)/Q(y|x(cid:48)) \u2208 {e\u2212\u03b5, 1, e\u03b5} and each column\nQ(y|\u00b7) must be proportional to one of the canonical staircase patterns. For example, when k = 3,\n\n4\n\n\fthere are 2k = 8 canonical patterns. De\ufb01ne a staircase pattern matrix S(k) \u2208 {1, e\u03b5}k\u00d7(2k) taking\nvalues either 1 or e\u03b5, such that the i-th column of S(k) has a staircase pattern corresponding to the\nbinary representation of i \u2212 1 \u2208 {0, . . . , 2k \u2212 1}. We order the columns of S(k) in this fashion for\nnotational convenience. For example,\n\n(cid:34)1\n\n1\n1\n\nS(3) =\n\n1\n1\ne\u03b5\n\n1\ne\u03b5\n1\n\n1\ne\u03b5\ne\u03b5\n\ne\u03b5\n1\n1\n\ne\u03b5\n1\ne\u03b5\n\ne\u03b5\ne\u03b5\n1\n\ne\u03b5\ne\u03b5\ne\u03b5\n\n(cid:35)\n\n.\n\nFor all values of k, there are exactly 2k such patterns, and any column of Q(y|\u00b7) is a scaled version\nof one of the columns of S(k). Using this \u201cpattern\u201d matrix, we will show that we can represent (an\nequivalence class of) any staircase mechanism Q as\n\nQ = S(k)\u0398 ,\n\n(8)\nwhere \u0398 \u2208 R2k\u00d72k is a diagonal matrix representing the scaling of the columns of S(k). We can\nnow formulate the problem of maximizing the divergence between the induced marginals as a linear\nprogram and prove that it is equivalent the original nonlinear program.\nTheorem 2.2. For any \u03b5, any pair of distributions P0 and P1, and any f-divergence, the nonlinear\nprogram of (5) and the following linear program have the same optimal value\n\n2k(cid:88)\n\nmaximize\n\u0398\u2208R2k\u00d72k\nsubject to\n\n\u00b5(S(k)\n\ni\n\n)\u0398ii\n\n(9)\n\n) = ((cid:80)\n\nwhere \u00b5(S(k)\n\ncolumn of S(k), such that Df (M0||M1) = (cid:80)2k\n\nx\u2208X P1(x)S(k)\n\ni\n\nrelated by (8).\n\nxi )f ((cid:80)\n\ni=1\nS(k)\u0398 1 = 1 ,\n\u0398 is a diagonal matrix ,\n\u0398 \u2265 0 ,\nx\u2208X P0(x)S(k)\ni=1 \u00b5(S(k)\n\nxi /(cid:80)\n\ni\n\nx\u2208X P1(x)S(k)\n\nis the i-th\n)\u0398ii. The solutions of (5) and (9) are\n\nxi ) and S(k)\n\ni\n\nThe in\ufb01nite dimensional nonlinear program of (5) is now reduced to a \ufb01nite dimensional linear\nprogram. The \ufb01rst constraint ensures that we get a valid probability transition matrix Q = S(k)\u0398\nwith a row sum of one. One could potentially solve this LP with 2k variables but its computational\ncomplexity scales exponentially in the alphabet size k = |X|. For practical values of k this might\nnot always be possible. However, in the following section, we give a precise description for the\noptimal mechanisms in the high privacy and low privacy regimes.\nIn order to understand the above theorem, observe that both the f-divergences and the differential\nprivacy constraints are invariant under permutation (or relabelling) of the columns of a privatization\nmechanism Q. For example, the KL-divergence Dkl(M0||M1) does not change if we permute the\ncolumns of Q. Similarly, both the f-divergences and the differential privacy constraints are invariant\nunder merging/splitting of outputs with the same pattern. To be speci\ufb01c, consider a privatization\nmechanism Q and suppose there exist two outputs y and y(cid:48) that have the same pattern, i.e. Q(y|\u00b7) =\nC Q(y(cid:48)|\u00b7) for some positive constant C. Then, we can consider a new mechanism Q(cid:48) by merging the\ntwo columns corresponding to y and y(cid:48). Let y(cid:48)(cid:48) denote this new output. It follows that Q(cid:48) satis\ufb01es\nthe differential privacy constraints and the resulting f-divergence is also preserved. Precisely, using\nthe fact that Q(y|\u00b7) = C Q(y(cid:48)|\u00b7), it follows that\nM(cid:48)\nM(cid:48)\nand thus the corresponding f-divergence is invariant:\n\n(cid:80)\n(cid:80)\nx(Q(y|x) + Q(y(cid:48)|x))P0(x)\nx(Q(y|x) + Q(y(cid:48)|x))P1(x)\n(cid:16) M0(y)\n(cid:16) M0(y(cid:48))\n\n(1 + C)(cid:80)\n(1 + C)(cid:80)\n(cid:17)\n\nx Q(y|x)P0(x)\nx Q(y|x)P1(x)\n(cid:17)\n(cid:16) M(cid:48)\n\nM0(y(cid:48))\nM1(y(cid:48))\n\n0(y(cid:48)(cid:48))\n1(y(cid:48)(cid:48))\n\nM0(y)\nM1(y)\n\n(cid:17)\n\n=\n\n=\n\n=\n\n=\n\n,\n\nM1(y(cid:48)) = f\n\n0(y(cid:48)(cid:48))\n1(y(cid:48)(cid:48))\n\nM(cid:48)\n\nM(cid:48)\n\n1(y(cid:48)(cid:48)) .\n\nf\n\nM1(y)\n\nM1(y) + f\n\nM1(y(cid:48))\n\nWe can naturally de\ufb01ne equivalence classes for staircase mechanisms that are equivalent up to a\npermutation of columns and merging/splitting of columns with the same pattern:\n[Q] = {Q(cid:48) \u2208 S\u03b5 | exists a sequence of permutations and merge/split of columns from Q(cid:48) to Q} . (10)\n\n5\n\n\fTo represent an equivalence class, we use a mechanism in [Q] that is ordered and merged to match\nthe patterns of the pattern matrix S(k). For any staircase mechanism Q, there exists a possibly\ndifferent staircase mechanism Q(cid:48) \u2208 [Q] such that Q(cid:48) = S(k)\u0398 for some diagonal matrix \u0398 with\nnonnegative entries. Therefore, to solve optimization problems of the form (5), we can restrict our\nattention to such representatives of equivalent classes. Further, for privatization mechanisms of the\nform Q = S(k)\u0398, the f-divergences take the form given in (9), a simple linear function of \u0398.\n\n2.2 Optimal mechanisms in high and low privacy regimes\n\nFor a given P0 and P1, the binary mechanism is de\ufb01ned as a staircase mechanism with only two\noutputs y \u2208 {0, 1} satisfying (see Figure 1)\nif P0(x) \u2265 P1(x) ,\nif P0(x) < P1(x) .\n\nif P0(x) < P1(x) ,\nif P0(x) \u2265 P1(x) .\n\n(cid:26) e\u03b5\n\n(cid:26) e\u03b5\n\nQ(1|x) =\n\nQ(0|x) =\n\n(11)\n\n1+e\u03b5\n\n1\n\n1+e\u03b5\n\n1+e\u03b5\n\n1\n\n1+e\u03b5\n\nAlthough this mechanism is extremely simple, perhaps surprisingly, we will establish that this is the\noptimal mechanism when high level of privacy is required. Intuitively, the output is very noisy in the\nhigh privacy regime, and we are better off sending just one bit of information that tells you whether\nyour data is more likely to have come from P0 or P1.\nTheorem 2.3. For any pair of distributions P0 and P1, there exists a positive \u03b5\u2217 that depends on P0\nand P1 such that for any f-divergences and any positive \u03b5 \u2264 \u03b5\u2217, the binary mechanism maximizes\nthe f-divergence between the induced marginals over all \u03b5-local differentially private mechanisms.\n\nThis implies that in the high privacy regime, which is a typical setting studied in much of differential\nprivacy literature, the binary mechanism is a universally optimal solution for all f-divergences in (5).\nIn particular this threshold \u03b5\u2217 is universal, in that it does not depend on the particular choice of which\nf-divergence we are maximizing. This is established by proving a very strong statistical dominance\nusing Blackwell\u2019s celebrated result on comparisons of statistical experiments [4]. In a nutshell, we\nprove that for suf\ufb01ciently small \u03b5, the output of any \u03b5-locally differentially private mechanism can be\nsimulated from the output of the binary mechanism. Hence, the binary mechanism dominates over\nall other mechanisms and at the same time achieves the maximum divergence. A similar idea has\nbeen used previously in [27] to exactly characterize how much privacy degrades under composition.\nThe optimality of binary mechanisms is not just for high privacy regimes. The next theorem shows\nthat it is the optimal solution of (5) for all \u03b5, when the objective function is the total variation\nDf (M0||M1) = (cid:107)M0 \u2212 M1(cid:107)TV.\nTheorem 2.4. For any pair of distributions P0 and P1, and any \u03b5 \u2265 0, the binary mechanism max-\nimizes total variation between the induced marginals M0 and M1 among all \u03b5-local differentially\nprivate mechanisms.\n\nWhen maximizing the KL-divergence between the induced marginals, we show that the binary\nmechanism still achieves a good performance for all \u03b5 \u2264 C where C \u2265 \u03b5\u2217 now does not depend on\nP0 and P1. For the special case of KL-divergence, let OPT denote the maximum value of (5) and\nBIN denote the KL-divergence when the binary mechanism is used. The next theorem shows that\n\nBIN \u2265\n\n1\n\n2(e\u03b5 + 1)2 OPT .\n\nTheorem 2.5. For any \u03b5 and for any pair of distributions P0 and P1, the binary mechanism is an\n1/(2(e\u03b5 + 1)2) approximation of the maximum KL-divergence between the induced marginals M0\nand M1 among all \u03b5-locally differentially private mechanisms.\nNote that 2(e\u03b5 + 1)2 \u2264 32 for \u03b5 \u2264 1, and \u03b5 \u2264 1 is a common regime of interest in differential\nprivacy. Therefore, we can always use the simple binary mechanism in this regime and the resulting\ndivergence is at most a constant factor away from the optimal one.\nThe randomized response mechanism is de\ufb01ned as a staircase mechanism with the same set of\noutputs as the input, Y = X , satisfying (see Figure 1)\n\n(cid:40)\n\nQ(y|x) =\n\ne\u03b5\n\n|X|\u22121+e\u03b5\n|X|\u22121+e\u03b5\n\n1\n\nif y = x ,\nif y (cid:54)= x .\n\n6\n\n\fIt is a randomization over the same alphabet where we are more likely to give an honest response.\nWe view it as a multiple choice generalization of the randomized response proposed by Warner [29],\nassuming equal privacy level for all choices. We establish that this is the optimal mechanism when\nlow level of privacy is required. Intuitively, the noise is small in the low privacy regime, and we\nwant to send as much information about our current data as allowed, but no more. For a special case\nof maximizing KL-divergence, we show that the randomized response mechanism is the optimal\nsolution of (5) in the low privacy regime (\u03b5 \u2265 \u03b5\u2217).\nTheorem 2.6. There exists a positive \u03b5\u2217 that depends on P0 and P1 such that for any P0 and P1, and\nall \u03b5 \u2265 \u03b5\u2217, the randomized response mechanism maximizes the KL-divergence between the induced\nmarginals over all \u03b5-locally differentially private mechanisms.\n\n2.3 Lower bounds in differential privacy\n\nIn this section, we provide converse results on the fundamental limit of differentially private mech-\nanisms. These results follow from our main theorems and are of independent interest in other ap-\nplications where lower bounds in statistical analysis are studied [3, 21, 6, 9]. For example, a bound\nsimilar to (12) was used to provide converse results on the sample complexity for statistical estima-\ntion with differentially private data in [10].\nCorollary 2.7. For any \u03b5 \u2265 0, let Q be any conditional distribution that guarantees \u03b5-local differ-\nential privacy. Then, for any pair of distributions P0 and P1, and any positive \u03b4 > 0, there exists a\npositive \u03b5\u2217 that depends on P0, P1, and \u03b4 such that for any \u03b5 \u2264 \u03b5\u2217, the induced marginals M0 and\nM1 satisfy the bound\n\n(cid:0)M0||M1\n\n(cid:1) + Dkl\n\n(cid:0)M1||M0\n\n(cid:1) \u2264 2(1 + \u03b4)(e\u03b5 \u2212 1)2\n\n(cid:13)(cid:13)P0 \u2212 P1\n\n(cid:13)(cid:13)2\n\n(12)\n\nDkl\n\nTV .\n\n(e\u03b5 + 1)\n\n(cid:13)(cid:13)2\n\n(cid:0)M0||M1\n\n(cid:13)(cid:13)P0 \u2212 P1\n\nThis follows from Theorem 2.3 and the fact that under the binary mechanism, Dkl\n\n(cid:1) =\nTV(e\u03b5 \u2212 1)2/(e\u03b5 + 1) + O(\u03b53) . Compared to [10, Theorem 1], we recover their bound\nof 4(e\u03b5 \u2212 1)2(cid:107)P0 \u2212 P1(cid:107)2\nTV with a smaller constant. We want to note that Duchi et al.\u2019s bound holds\nfor all values of \u03b5 and uses different techniques. However no achievable mechanism is provided. We\ninstead provide an explicit mechanism that is optimal in high privacy regime.\nSimilarly, in the high privacy regime, we can show the following converse result.\nCorollary 2.8. For any \u03b5 \u2265 0, let Q be any conditional distribution that guarantees \u03b5-local differ-\nential privacy. Then, for any pair of distributions P0 and P1, and any positive \u03b4 > 0, there exists a\npositive \u03b5\u2217 that depends on P0, P1, and \u03b4 such that for any \u03b5 \u2265 \u03b5\u2217, the induced marginals M0 and\nM1 satisfy the bound\n\n(cid:1) \u2264 Dkl(P0||P1) \u2212 (1 \u2212 \u03b4)G(P0, P1)e\u2212\u03b5 .\n\n(cid:0)M1||M0\n\n(cid:1) + Dkl\n\n(cid:0)M0||M1\nwhere G(P0, P1) =(cid:80)\n\nDkl\n\nx\u2208X (1 \u2212 P0(x)) log(P1(x)/P0(x)).\n\nThis follows directly from Theorem 2.6 and the fact that under the randomized response mechanism,\nDkl(M0||M1) = Dkl(P0||P1) \u2212 G(P0, P1)e\u2212\u03b5 + O(e\u22122\u03b5) .\nSimilarly for total variation, we can get the following converse result. This follows from Theorem\n2.4 and explicitly computing the total variation achieved by the binary mechanism.\nCorollary 2.9. For any \u03b5 \u2265 0, let Q be any conditional distribution that guarantees \u03b5-local differ-\nential privacy. Then, for any pair of distributions P0 and P1, the induced marginals M0 and M1\n\n(cid:13)(cid:13)TV \u2264 ((e\u03b5 \u2212 1)/(e\u03b5 + 1))(cid:13)(cid:13)P0 \u2212 P1\n\n(cid:13)(cid:13)TV , and equality is achieved\n\nsatisfy the bound(cid:13)(cid:13)M0 \u2212 M1\n\nby the binary mechanism.\n\n2.4 Connections to hypothesis testing\n\nUnder the data collection scenario, there are n individuals each with data Xi sampled from a distri-\nbution P\u03bd for a \ufb01xed \u03bd \u2208 {0, 1}. Let Q be a non-interactive privatization mechanism guaranteeing\n\u03b5-local differential privacy. The privatized views {Yi}n\ni=1, are independently distributed according\nto one of the induced marginals M0 or M1 de\ufb01ned in (2).\n\n7\n\n\fGiven the privatized views {Yi}n\ni=1, the data analyst wants to test whether they were generated from\nM0 or M1. Let the null hypothesis be H0 : Yi\u2019s are generated from M0, and the alternative hypoth-\nesis H1 : Yi\u2019s are generated from M1. For a choice of rejection region R \u2286 Y n, the probability\nof false alarm (type I error) is \u03b1 = M n\n0 (R) and the probability of miss detection (type II error) is\n1 (Y n \\ R). Let \u03b2\u03b4 = minR\u2286Y n,\u03b1<\u03b1\u2217 \u03b2 denote the minimum type II error achievable while\n\u03b2 = M n\nkeeping type I error rate at most \u03b1\u2217. According to Chernoff-Stein lemma, we know that\n\nlim\nn\u2192\u221e\n\n1\nn\n\nlog \u03b2\u03b1\u2217\n\n= \u2212Dkl(M0||M1) .\n\nSuppose the analyst knows P0, P1, and Q. Then, in order to achieve optimal asymptotic error rate,\none would want to maximize the KL-divergence between the induced marginals over all \u03b5-locally\ndifferentially private mechanisms Q. Theorems 2.3 and 2.6 provide an explicit construction of the\noptimal mechanisms in high and low privacy regimes. Further, our converse results in Section 2.3\nprovides a fundamental limit on the achievable error rates under differential privacy. Precisely,\nwith data collected from an \u03b5-locally differentially privatization mechanism, one cannot achieve an\nasymptotic type II error smaller than\n\n(cid:107)P0 \u2212 P1(cid:107)2\n\nTV \u2265 \u2212 (1 + \u03b4)(e\u03b5 \u2212 1)2\n\nDkl(P0||P1) ,\n\nlog \u03b2\u03b1\u2217 \u2265 \u2212 (1 + \u03b4)(e\u03b5 \u2212 1)2\n\n1\nn\n\n(e\u03b5 + 1)\n\nlim\nn\u2192\u221e\nwhenever \u03b5 \u2264 \u03b5\u2217, where \u03b5\u2217 is dictated by Theorem 2.3. In the equation above, the second inequality\nfollows from Pinsker\u2019s inequality. Since (e\u03b5 \u2212 1)2 = O(\u03b52) for small \u03b5, the effective sample size is\nnow reduced from n to \u03b52n. This is the price of privacy. In the low privacy regime where \u03b5 \u2265 \u03b5\u2217,\nfor \u03b5\u2217 dictated by Theorem 2.6, one cannot achieve an asymptotic type II error smaller than\n\n2(e\u03b5 + 1)\n\nlim\nn\u2192\u221e\n\n1\nn\n\nlog \u03b2\u03b1\u2217 \u2265 \u2212Dkl(P0||P1) + (1 \u2212 \u03b4)G(P0, P1)e\u2212\u03b5 .\n\n3 Discussion\n\nIn this paper, we have considered f-divergence utility functions and assumed a setting where individ-\nuals cannot collaborate (communicate with each other) before releasing their data. It turns out that\nthe optimality results presented in Section 2 are general and hold for a large class of convex utility\nfunction [22]. In addition, the techniques developed in this work can be generalized to \ufb01nd optimal\nprivatization mechanisms in a setting where different individuals can collaborate interactively and\neach individual can be an analyst [23].\nBinary hypothesis testing is a canonical statistical inference problem with a wide range of applica-\ntions. However, there are a number of nontrivial and interesting extensions to our work. Firstly,\nin some scenarios the Xi\u2019s could be correlated (e.g., when different individuals observe differ-\nent functions of the same random variable).\nIn this case, the data analyst is interested in infer-\nring whether the data was generated from P n\n\u03bd is one of two possible joint pri-\nors on X1, ..., Xn. This is a challenging problem because knowing Xi reveals information about\nXj, j (cid:54)= i. Therefore, the utility maximization problems for different individuals are coupled in\nthis setting. Secondly, in some cases the data analyst need not have access to P0 and P1, but\nrather two classes of prior distribution P\u03b80 and P\u03b81 for \u03b80 \u2208 \u039b0 and \u03b81 \u2208 \u039b1. Such problems\nare studied under the rubric of universal hypothesis testing and robust hypothesis testing. One\npossible direction is to select the privatization mechanism that maximizes the worst case utility:\nQ\u2217 = arg maxQ\u2208D\u03b5 min\u03b80\u2208\u039b0,\u03b81\u2208\u039b1 Df (M\u03b80||M\u03b81), where M\u03b8\u03bd is the induced marginal under\nP\u03b8\u03bd . Finally, the more general problem of private m-ary hypothesis testing is also an interesting but\nchallenging one. In this setting, the Xi\u2019s can follow one of m distributions P0, P1, ..., Pm\u22121, and\ntherefore the Yi\u2019s can follow one of m distributions M0, M1, ..., Mm\u22121. The utility can be de\ufb01ned\ni(cid:54)=j Df (Mi||Mj), or\nthe worst case one: mini(cid:54)=j Df (Mi||Mj).\n\nas the average f-divergence between any two distributions: 1/(m(m \u2212 1))(cid:80)\n\n1 , where P n\n\n0 or P n\n\nReferences\n[1] Alessandro Acquisti. Privacy in electronic commerce and the economics of immediate grati\ufb01cation. In\n\nProceedings of the 5th ACM conference on Electronic commerce, pages 21\u201329. ACM, 2004.\n\n8\n\n\f[2] Alessandro Acquisti and Jens Grossklags. What can behavioral economics teach us about privacy. Digital\n\nPrivacy, page 329, 2007.\n\n[3] A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and\n\nwhat. In Advances in Cryptology\u2013CRYPTO 2008, pages 451\u2013468. Springer, 2008.\n\n[4] D. Blackwell. Equivalent comparisons of experiments. The Annals of Mathematical Statistics, 24(2):265\u2013\n\n272, 1953.\n\n[5] J. Blocki, A. Blum, A. Datta, and O. Sheffet. The johnson-lindenstrauss transform itself preserves dif-\nferential privacy. In Foundations of Computer Science, 2012 IEEE 53rd Annual Symposium on, pages\n410\u2013419. IEEE, 2012.\n\n[6] K. Chaudhuri and D. Hsu. Convergence rates for differentially private statistical estimation. arXiv preprint\n\narXiv:1206.6395, 2012.\n\n[7] K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression.\n\n289\u2013296, 2008.\n\nIn NIPS, volume 8, pages\n\n[8] K. Chaudhuri, A. D. Sarwate, and K. Sinha. Near-optimal differentially private principal components. In\n\nNIPS, pages 998\u20131006, 2012.\n\n[9] A. De. Lower bounds in differential privacy. In Theory of Cryptography, pages 321\u2013338. Springer, 2012.\n[10] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In\nFoundations of Computer Science, 2013 IEEE 54th Annual Symposium on, pages 429\u2013438. IEEE, 2013.\n[11] C. Dwork. Differential privacy. In Automata, languages and programming, pages 1\u201312. Springer, 2006.\n[12] C. Dwork and J. Lei. Differential privacy and robust statistics. In Proceedings of the 41st annual ACM\n\nsymposium on Theory of computing, pages 371\u2013380. ACM, 2009.\n\n[13] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.\n\nIn Theory of Cryptography, pages 265\u2013284. Springer, 2006.\n\n[14] Q. Geng and P. Viswanath.\n\narXiv:1212.1186, 2012.\n\nThe optimal mechanism in differential privacy.\n\narXiv preprint\n\n[15] Q. Geng and P. Viswanath. The optimal mechanism in differential privacy: Multidimensional setting.\n\narXiv preprint arXiv:1312.0655, 2013.\n\n[16] Q. Geng and P. Viswanath. The optimal mechanism in (\u0001,\u03b4)-differential privacy.\n\narXiv:1305.1330, 2013.\n\narXiv preprint\n\n[17] A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms.\n\nSIAM Journal on Computing, 41(6):1673\u20131693, 2012.\n\n[18] M. Hardt, K. Ligett, and F. McSherry. A simple and practical algorithm for differentially private data\n\nrelease. In NIPS, pages 2348\u20132356, 2012.\n\n[19] M. Hardt and A. Roth. Beating randomized response on incoherent matrices. In Proceedings of the 44th\n\nsymposium on Theory of Computing, pages 1255\u20131268. ACM, 2012.\n\n[20] M. Hardt and G. N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis.\nIn Foundations of Computer Science, 2010 51st Annual IEEE Symposium on, pages 61\u201370. IEEE, 2010.\nIn Proceedings of the 42nd ACM\n\n[21] M. Hardt and K. Talwar. On the geometry of differential privacy.\n\nsymposium on Theory of computing, pages 705\u2013714. ACM, 2010.\n\n[22] P. Kairouz, S. Oh, and P. Viswanath. Extremal mechanisms for local differential privacy. arXiv preprint\n\narXiv:1407.1338, 2014.\n\n[23] P. Kairouz, S. Oh, and P. Viswanath. Optimality of non-interactive randomized response. arXiv preprint\n\narXiv:1407.1546, 2014.\n\n[24] M. Kapralov and K. Talwar. On differentially private low rank approximation.\n\npage 1. SIAM, 2013.\n\nIn SODA, volume 5,\n\n[25] J. Lei. Differentially private m-estimators. In NIPS, pages 361\u2013369, 2011.\n[26] F. McSherry and K. Talwar. Mechanism design via differential privacy.\n\nScience, 2007. 48th Annual IEEE Symposium on, pages 94\u2013103. IEEE, 2007.\n\nIn Foundations of Computer\n\n[27] S. Oh and P. Viswanath.\n\narXiv:1311.0776, 2013.\n\nThe composition theorem for differential privacy.\n\narXiv preprint\n\n[28] A. B. Tsybakov and V. Zaiats. Introduction to nonparametric estimation, volume 11. Springer, 2009.\n[29] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal\n\nof the American Statistical Association, 60(309):63\u201369, 1965.\n\n9\n\n\f", "award": [], "sourceid": 1486, "authors": [{"given_name": "Peter", "family_name": "Kairouz", "institution": "University of Illinois at Urbana Champaign"}, {"given_name": "Sewoong", "family_name": "Oh", "institution": "UIUC"}, {"given_name": "Pramod", "family_name": "Viswanath", "institution": "UIUC"}]}