{"title": "Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM", "book": "Advances in Neural Information Processing Systems", "page_first": 2566, "page_last": 2576, "abstract": "Traditional approaches to differential privacy assume a fixed privacy requirement \u03b5 for a computation, and attempt to maximize the accuracy of the computation subject to the privacy constraint. As differential privacy is increasingly deployed in practical settings, it may often be that there is instead a fixed accuracy requirement for a given computation and the data analyst would like to maximize the privacy of the computation subject to the accuracy constraint. This raises the question of how to find and run a maximally private empirical risk minimizer subject to a given accuracy requirement. We propose a general \u201cnoise reduction\u201d framework that can apply to a variety of private empirical risk minimization (ERM) algorithms, using them to \u201csearch\u201d the space of privacy levels to find the empirically strongest one that meets the accuracy constraint, and incurring only logarithmic overhead in the number of privacy levels searched. The privacy analysis of our algorithm leads naturally to a version of differential privacy where the privacy parameters are dependent on the data, which we term ex-post privacy, and which is related to the recently introduced notion of privacy odometers. We also give an ex-post privacy analysis of the classical AboveThreshold privacy tool, modifying it to allow for queries chosen depending on the database. Finally, we apply our approach to two common objective functions, regularized linear and logistic regression, and empirically compare our noise reduction methods to (i) inverting the theoretical utility guarantees of standard private ERM algorithms and (ii) a stronger empirical baseline based on binary search.", "full_text": "Accuracy First: Selecting a Differential Privacy Level\n\nfor Accuracy-Constrained ERM\n\nKatrina Ligett\n\nCaltech and Hebrew University\n\nSeth Neel\n\nUniversity of Pennsylvania\n\nAaron Roth\n\nUniversity of Pennsylvania\n\nBo Waggoner\n\nUniversity of Pennsylvania\n\nZhiwei Steven Wu\nMicrosoft Research\n\nAbstract\n\nTraditional approaches to differential privacy assume a \ufb01xed privacy requirement\n\u03b5 for a computation, and attempt to maximize the accuracy of the computation\nsubject to the privacy constraint. As differential privacy is increasingly deployed in\npractical settings, it may often be that there is instead a \ufb01xed accuracy requirement\nfor a given computation and the data analyst would like to maximize the privacy of\nthe computation subject to the accuracy constraint. This raises the question of how\nto \ufb01nd and run a maximally private empirical risk minimizer subject to a given\naccuracy requirement. We propose a general \u201cnoise reduction\u201d framework that\ncan apply to a variety of private empirical risk minimization (ERM) algorithms,\nusing them to \u201csearch\u201d the space of privacy levels to \ufb01nd the empirically strongest\none that meets the accuracy constraint, and incurring only logarithmic overhead\nin the number of privacy levels searched. The privacy analysis of our algorithm\nleads naturally to a version of differential privacy where the privacy parameters\nare dependent on the data, which we term ex-post privacy, and which is related\nto the recently introduced notion of privacy odometers. We also give an ex-post\nprivacy analysis of the classical AboveThreshold privacy tool, modifying it to allow\nfor queries chosen depending on the database. Finally, we apply our approach to\ntwo common objective functions, regularized linear and logistic regression, and\nempirically compare our noise reduction methods to (i) inverting the theoretical\nutility guarantees of standard private ERM algorithms and (ii) a stronger, empirical\nbaseline based on binary search.1\n\n1\n\nIntroduction and Related Work\n\nDifferential Privacy [7, 8] enjoys over a decade of study as a theoretical construct, and a much more\nrecent set of large-scale practical deployments, including by Google [10] and Apple [11]. As the large\ntheoretical literature is put into practice, we start to see disconnects between assumptions implicit\nin the theory and the practical necessities of applications. In this paper we focus our attention on\none such assumption in the domain of private empirical risk minimization (ERM): that the data\nanalyst \ufb01rst chooses a privacy requirement, and then attempts to obtain the best accuracy guarantee\n(or empirical performance) that she can, given the chosen privacy constraint. Existing theory is\ntailored to this view: the data analyst can pick her privacy parameter \u03b5 via some exogenous process,\nand either plug it into a \u201cutility theorem\u201d to upper bound her accuracy loss, or simply deploy her\nalgorithm and (privately) evaluate its performance. There is a rich and substantial literature on private\nconvex ERM that takes this approach, weaving tight connections between standard mechanisms in\n\n1A full version of this paper appears on the arXiv preprint site: https://arxiv.org/abs/1705.10829.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fdifferential privacy and standard tools for empirical risk minimization. These methods for private\nERM include output and objective perturbation [5, 14, 18, 4], covariance perturbation [19], the\nexponential mechanism [16, 2], and stochastic gradient descent [2, 21, 12, 6, 20].\nWhile these existing algorithms take a privacy-\ufb01rst perspective, in practice, product requirements\nmay impose hard accuracy constraints, and privacy (while desirable) may not be the over-riding\nconcern. In such situations, things are reversed: the data analyst \ufb01rst \ufb01xes an accuracy requirement,\nand then would like to \ufb01nd the smallest privacy parameter consistent with the accuracy constraint.\nHere, we \ufb01nd a gap between theory and practice. The only theoretically sound method available is to\ntake a \u201cutility theorem\u201d for an existing private ERM algorithm and solve for the smallest value of\n\u03b5 (the differential privacy parameter)\u2014and other parameter values that need to be set\u2014consistent\nwith her accuracy requirement, and then run the private ERM algorithm with the resulting \u03b5. But\nbecause utility theorems tend to be worst-case bounds, this approach will generally be extremely\nconservative, leading to a much larger value of \u03b5 (and hence a much larger leakage of information)\nthan is necessary for the problem at hand. Alternately, the analyst could attempt an empirical search\nfor the smallest value of \u03b5 consistent with her accuracy goals. However, because this search is itself\na data-dependent computation, it incurs the overhead of additional privacy loss. Furthermore, it is\nnot a priori clear how to undertake such a search with nontrivial privacy guarantees for two reasons:\n\ufb01rst, the worst case could involve a very long search which reveals a large amount of information,\nand second, the selected privacy parameter is now itself a data-dependent quantity, and so it is not\nsensible to claim a \u201cstandard\u201d guarantee of differential privacy for any \ufb01nite value of \u03b5 ex-ante.\nIn this paper, we provide a principled variant of this second approach, which attempts to empirically\n\ufb01nd the smallest value of \u03b5 consistent with an accuracy requirement. We give a meta-method that\ncan be applied to several interesting classes of private learning algorithms and introduces very little\nprivacy overhead as a result of the privacy-parameter search. Conceptually, our meta-method initially\ncomputes a very private hypothesis, and then gradually subtracts noise (making the computation less\nand less private) until a suf\ufb01cient level of accuracy is achieved. One key technique that signi\ufb01cantly\nreduces privacy loss over naive search is the use of correlated noise generated by the method of [15],\nwhich formalizes the conceptual idea of \u201csubtracting\u201d noise without incurring additional privacy\noverhead. In order to select the most private of these queries that meets the accuracy requirement, we\nintroduce a natural modi\ufb01cation of the now-classic AboveThreshold algorithm [8], which iteratively\nchecks a sequence of queries on a dataset and privately releases the index of the \ufb01rst to approximately\nexceed some \ufb01xed threshold. Its privacy cost increases only logarithmically with the number of\nqueries. We provide an analysis of AboveThreshold that holds even if the queries themselves are\nthe result of differentially private computations, showing that if AboveThreshold terminates after t\nqueries, one only pays the privacy costs of AboveThreshold plus the privacy cost of revealing those\n\ufb01rst t private queries. When combined with the above-mentioned correlated noise technique of [15],\nthis gives an algorithm whose privacy loss is equal to that of the \ufb01nal hypothesis output \u2013 the previous\nones coming \u201cfor free\u201d \u2013 plus the privacy loss of AboveThreshold. Because the privacy guarantees\nachieved by this approach are not \ufb01xed a priori, but rather are a function of the data, we introduce\nand apply a new, corresponding privacy notion, which we term ex-post privacy, and which is closely\nrelated to the recently introduced notion of \u201cprivacy odometers\u201d [17].\nIn Section 4, we empirically evaluate our noise reduction meta-method, which applies to any ERM\ntechnique which can be described as a post-processing of the Laplace mechanism. This includes both\ndirect applications of the Laplace mechanism, like output perturbation [5]; and more sophisticated\nmethods like covariance perturbation [19], which perturbs the covariance matrix of the data and\nthen performs an optimization using the noisy data. Our experiments concentrate on (cid:96)2 regularized\nleast-squares regression and (cid:96)2 regularized logistic regression, and we apply our noise reduction\nmeta-method to both output perturbation and covariance perturbation. Our empirical results show\nthat the active, ex-post privacy approach massively outperforms inverting the theory curve, and also\nimproves on a baseline \u201c\u03b5-doubling\u201d approach.\n\n2 Privacy Background and Tools\n\n2.1 Differential Privacy and Ex-Post Privacy\nLet X denote the data domain. We call two datasets D, D(cid:48) \u2208 X \u2217 neighbors (written as D \u223c D(cid:48)) if\nD can be derived from D(cid:48) by replacing a single data point with some other element of X .\n\n2\n\n\fDe\ufb01nition 2.1 (Differential Privacy [7]). Fix \u03b5 \u2265 0. A randomized algorithm A : X \u2217 \u2192 O is\n\u03b5-differentially private if for every pair of neighboring data sets D \u223c D(cid:48) \u2208 X \u2217, and for every event\nS \u2286 O:\n\nPr[A(D) \u2208 S] \u2264 exp(\u03b5) Pr[A(D(cid:48)) \u2208 S].\n\nWe call exp(\u03b5) the privacy risk factor.\n\nIt is possible to design computations that do not satisfy the differential privacy de\ufb01nition, but whose\noutputs are private to an extent that can be quanti\ufb01ed after the computation halts. For example,\nconsider an experiment that repeatedly runs an \u03b5(cid:48)-differentially private algorithm, until a stopping\ncondition de\ufb01ned by the output of the algorithm itself is met. This experiment does not satisfy\n\u03b5-differential privacy for any \ufb01xed value of \u03b5, since there is no \ufb01xed maximum number of rounds\nfor which the experiment will run (for a \ufb01xed number of rounds, a simple composition theorem,\nTheorem 2.5, shows that the \u03b5-guarantees in a sequence of computations \u201cadd up.\u201d) However, if ex-\npost we see that the experiment has stopped after k rounds, the data can in some sense be assured an\n\u201cex-post privacy loss\u201d of only k\u03b5(cid:48). Rogers et al. [17] initiated the study of privacy odometers, which\nformalize this idea. They study privacy composition when the data analyst can choose the privacy\nparameters of subsequent computations as a function of the outcomes of previous computations.\nWe apply a related idea here, for a different purpose. Our goal is to design one-shot algorithms that\nalways achieve a target accuracy but that may have variable privacy levels depending on their input.\nDe\ufb01nition 2.2. Given a randomized algorithm A : X \u2217 \u2192 O, de\ufb01ne the ex-post privacy loss2 of A\non outcome o to be\n\nLoss(o) = max\n\nD,D(cid:48):D\u223cD(cid:48) log\n\nPr [A(D) = o]\nPr [A(D(cid:48)) = o]\n\n.\n\nWe refer to exp (Loss(o)) as the ex-post privacy risk factor.\nDe\ufb01nition 2.3 (Ex-Post Differential Privacy). Let E : O \u2192 (R\u22650 \u222a {\u221e}) be a function on the\noutcome space of algorithm A : X \u2217 \u2192 O. Given an outcome o = A(D), we say that A satis\ufb01es\nE(o)-ex-post differential privacy if for all o \u2208 O, Loss(o) \u2264 E(o).\nNote that if E(o) \u2264 \u03b5 for all o, A is \u03b5-differentially private. Ex-post differential privacy has the\nsame semantics as differential privacy, once the output of the mechanism is known: it bounds the\nlog-likelihood ratio of the dataset being D vs. D(cid:48), which controls how an adversary with an arbitrary\nprior on the two cases can update her posterior.\n\n2.2 Differential Privacy Tools\n\nDifferentially private computations enjoy two nice properties:\nTheorem 2.4 (Post Processing [7]). Let A : X \u2217 \u2192 O be any \u03b5-differentially private algorithm, and\nlet f : O \u2192 O(cid:48) be any function. Then the algorithm f \u25e6 A : X \u2217 \u2192 O(cid:48) is also \u03b5-differentially private.\nPost-processing implies that, for example, every decision process based on the output of a differen-\ntially private algorithm is also differentially private.\nTheorem 2.5 (Composition [7]). Let A1 : X \u2217 \u2192 O, A2 : X \u2217 \u2192 O(cid:48) be algorithms that are\n\u03b51- and \u03b52-differentially private, respectively. Then the algorithm A : X \u2217 \u2192 O \u00d7 O(cid:48) de\ufb01ned as\nA(x) = (A1(x), A2(x)) is (\u03b51 + \u03b52)-differentially private.\n\nThe composition theorem holds even if the composition is adaptive\u2014-see [9] for details.\nThe Laplace mechanism. The most basic subroutine we will use is the Laplace mechanism. The\nLaplace Distribution centered at 0 with scale b is the distribution with probability density function\nLap (z|b) = 1\nb . We say X \u223c Lap (b) when X has Laplace distribution with scale b. Let\nf : X \u2217 \u2192 Rd be an arbitrary d-dimensional function. The (cid:96)1 sensitivity of f is de\ufb01ned to be\n\u22061(f ) = maxD\u223cD(cid:48) (cid:107)f (D) \u2212 f (D(cid:48))(cid:107)1. The Laplace mechanism with parameter \u03b5 simply adds noise\ndrawn independently from Lap\n\nto each coordinate of f (x).\n\n2b e\u2212 |z|\n\n(cid:16) \u22061(f )\n\n(cid:17)\n\n\u03b5\n\n2If A\u2019s output is from a continuous distribution rather than discrete, we abuse notation and write Pr[A(D) =\n\no] to mean the probability density at output o.\n\n3\n\n\fTheorem 2.6 ([7]). The Laplace mechanism is \u03b5-differentially private.\n\nGradual private release. Koufogiannis et al. [15] study how to gradually release private data using\nthe Laplace mechanism with an increasing sequence of \u03b5 values, with a privacy cost scaling only with\nthe privacy of the marginal distribution on the least private release, rather than the sum of the privacy\ncosts of independent releases. For intuition, the algorithm can be pictured as a continuous random\nwalk starting at some private data v with the property that the marginal distribution at each point in\ntime is Laplace centered at v, with variance increasing over time. Releasing the value of the random\nwalk at a \ufb01xed point in time gives a certain output distribution, for example, \u02c6v, with a certain privacy\nguarantee \u03b5. To produce \u02c6v(cid:48) whose ex-ante distribution has higher variance (is more private), one can\nsimply \u201cfast forward\u201d the random walk from a starting point of \u02c6v to reach \u02c6v(cid:48); to produce a less private\n\u02c6v(cid:48), one can \u201crewind.\u201d The total privacy cost is max{\u03b5, \u03b5(cid:48)} because, given the \u201cleast private\u201d point\n(say \u02c6v), all \u201cmore private\u201d points can be derived as post-processings given by taking a random walk\nof a certain length starting at \u02c6v. Note that were the Laplace random variables used for each release\nindependent, the composition theorem would require summing the \u03b5 values of all releases.\nIn our private algorithms, we will use their noise reduction mechanism as a building block to generate\na list of private hypotheses \u03b81, . . . , \u03b8T with gradually increasing \u03b5 values. Importantly, releasing any\npre\ufb01x (\u03b81, . . . , \u03b8t) only incurs the privacy loss in \u03b8t. More formally:\nAlgorithm 1 Noise Reduction [15]: NR(v, \u2206,{\u03b5t})\n\n(cid:46) drawn i.i.d. for each coordinate\n\nInput: private vector v, sensitivity parameter \u2206, list \u03b51 < \u03b52 < \u00b7\u00b7\u00b7 < \u03b5T\nSet \u02c6vT := v + Lap (\u2206/\u03b5T )\nfor t = T \u2212 1, T \u2212 2, . . . , 1 do\n\n(cid:16) \u03b5t\n\n(cid:17)2\n\nWith probability\nElse: set \u02c6vt := \u02c6vt+1 + Lap (\u2206/\u03b5t)\n\n\u03b5t+1\n\n: set \u02c6vt := \u02c6vt+1\n\n(cid:46) drawn i.i.d. for each coordinate\n\nReturn \u02c6v1, . . . , \u02c6vT\n\nTheorem 2.7 ([15]). Let f have (cid:96)1 sensitivity \u2206 and let \u02c6v1, . . . , \u02c6vT be the output of Algorithm 1 on\nv = f (D), \u2206, and the increasing list \u03b51, . . . , \u03b5T . Then for any t, the algorithm which outputs the\npre\ufb01x (\u02c6v1, . . . , \u02c6vt) is \u03b5t-differentially private.\n\n2.3 AboveThreshold with Private Queries\n\nOur high-level approach to our eventual ERM problem will be as follows: Generate a sequence\nof hypotheses \u03b81, . . . , \u03b8T , each with increasing accuracy and decreasing privacy; then test their\naccuracy levels sequentially, outputting the \ufb01rst one whose accuracy is \u201cgood enough.\u201d The classical\nAboveThreshold algorithm [8] takes in a dataset and a sequence of queries and privately outputs the\nindex of the \ufb01rst query to exceed a given threshold (with some error due to noise). We would like to\nuse AboveThreshold to perform these accuracy checks, but there is an important obstacle: for us, the\n\u201cqueries\u201d themselves depend on the private data.3 A standard composition analysis would involve \ufb01rst\nprivately publishing all the queries, then running AboveThreshold on these queries (which are now\npublic). Intuitively, though, it would be much better to generate and publish the queries one at a time,\nuntil AboveThreshold halts, at which point one would not publish any more queries. The problem\nwith analyzing this approach is that, a-priori, we do not know when AboveThreshold will terminate;\nto address this, we analyze the ex-post privacy guarantee of the algorithm.4\nLet us say that an algorithm M (D) = (f1, . . . , fT ) is (\u03b51, . . . , \u03b5T )-pre\ufb01x-private if for each t, the\nfunction that runs M (D) and outputs just the pre\ufb01x (f1, . . . , ft) is \u03b5t-differentially private.\nLemma 2.8. Let M : X \u2217 \u2192 (X \u2217 \u2192 O)T be a (\u03b51, . . . , \u03b5T )-pre\ufb01x private algorithm that returns T\nqueries, and let each query output by M have (cid:96)1 sensitivity at most \u2206. Then Algorithm 2 run on D,\n\u03b5A, W , \u2206, and M is E-ex-post differentially private for E((t,\u00b7)) = \u03b5A + \u03b5t for any t \u2208 [T ].\n\n3In fact, there are many applications beyond our own in which the sequence of queries input to AboveThresh-\nold might be the result of some private prior computation on the data, and where we would like to release both the\nstopping index of AboveThreshold and the \u201cquery object.\u201d (In our case, the query objects will be parameterized\nby learned hypotheses \u03b81, . . . , \u03b8T .)\n\n4This result does not follow from a straightforward application of privacy odometers from [17], because the\n\nprivacy analysis of algorithms like the noise reduction technique is not compositional.\n\n4\n\n\fInput: Dataset D, privacy loss \u03b5, threshold W , (cid:96)1 sensitivity \u2206, algorithm M\n\nAlgorithm 2 InteractiveAboveThreshold: IAT(D, \u03b5, W, \u2206, M )\n\nLet \u02c6W = W + Lap(cid:0) 2\u2206\nif ft(D) + Lap(cid:0) 4\u2206\n\nQuery ft \u2190 M (D)t\n\n(cid:1)\n(cid:1) \u2265 \u02c6W : then Output (t, ft); Halt.\n\nfor each query t = 1, . . . , T do\n\n\u03b5\n\n\u03b5\n\nOutput (T , \u22a5).\n\nThe proof, which is a variant on the proof of privacy for AboveThreshold [8], appears in the full\nversion, along with an accuracy theorem for IAT.\n\n3 Noise-Reduction with Private ERM\n\nIn this section, we provide a general private ERM framework that allows us to approach the best\nprivacy guarantee achievable on the data given a target excess risk goal. Throughout the section,\nwe consider an input dataset D that consists of n row vectors X1, X2, . . . , Xn \u2208 Rp and a column\ny \u2208 Rn. We will assume that each (cid:107)Xi(cid:107)1 \u2264 1 and |yi| \u2264 1. Let di = (Xi, yi) \u2208 Rp+1 be the i-th\ndata record. Let (cid:96) be a loss function such that for any hypothesis \u03b8 and any data point (Xi, yi) the loss\nis (cid:96)(\u03b8, (Xi, yi)). Given an input dataset D and a regularization parameter \u03bb, the goal is to minimize\nthe following regularized empirical loss function over some feasible set C:\n\nn(cid:88)\n\ni=1\n\nL(\u03b8, D) =\n\n1\nn\n\n(cid:96)(\u03b8, (Xi, yi)) +\n\n(cid:107)\u03b8(cid:107)2\n2.\n\n\u03bb\n2\n\nLet \u03b8\u2217 = argmin\u03b8\u2208C (cid:96)(\u03b8, D). Given a target accuracy parameter \u03b1, we wish to privately compute a\n\u03b8p that satis\ufb01es L(\u03b8p, D) \u2264 L(\u03b8\u2217, D) + \u03b1, while achieving the best ex-post privacy guarantee. For\nsimplicity, we will sometimes write L(\u03b8) for L(\u03b8, D).\nOne simple baseline approach is a \u201cdoubling method\u201d: Start with a small \u03b5 value, run an \u03b5-\ndifferentially private algorithm to compute a hypothesis \u03b8 and use the Laplace mechanism to estimate\nthe excess risk of \u03b8; if the excess risk is lower than the target, output \u03b8; otherwise double the value of\n\u03b5 and repeat the same process. (See the full version for details.) As a result, we pay for privacy loss\nfor every hypothesis we compute and every excess risk we estimate.\nIn comparison, our meta-method provides a more cost-effective way to select the privacy level. The\nalgorithm takes a more re\ufb01ned set of privacy levels \u03b51 < . . . < \u03b5T as input and generates a sequence\nof hypotheses \u03b81, . . . , \u03b8T such that the generation of each \u03b8t is \u03b5t-private. Then it releases the\nhypotheses \u03b8t in order, halting as soon as a released hypothesis meets the accuracy goal. Importantly,\nthere are two key components that reduce the privacy loss in our method:\n\n1. We use Algorithm 1, the \u201cnoise reduction\u201d method of [15], for generating the sequence of\nhypotheses: we \ufb01rst compute a very private and noisy \u03b81, and then obtain the subsequent\nhypotheses by gradually \u201cde-noising\u201d \u03b81. As a result, any pre\ufb01x (\u03b81, . . . , \u03b8k) incurs a\nprivacy loss of only \u03b5k (as opposed to (\u03b51 + . . . + \u03b5k) if the hypotheses were independent).\n\n2. When evaluating the excess risk of each hypothesis, we use Algorithm 2, Interactive-\nAboveThreshold, to determine if its excess risk exceeds the target threshold. This incurs\nsubstantially less privacy loss than independently evaluating the excess risk of each hypothe-\nsis using the Laplace mechanism (and hence allows us to search a \ufb01ner grid of values).\n\nFor the rest of this section, we will instantiate our method concretely for two ERM problems: ridge\nregression and logistic regression. In particular, our noise-reduction method is based on two private\nERM algorithms: the recently introduced covariance perturbation technique [19] and the output\nperturbation method [5].\n\n5\n\n\f3.1 Covariance Perturbation for Ridge Regression\n\nIn ridge regression, we consider the squared loss function: (cid:96)((Xi, yi), \u03b8) = 1\nhence empirical loss over the data set is de\ufb01ned as\n\n2 (yi \u2212 (cid:104)\u03b8, Xi(cid:105))2, and\n\n(cid:107)y \u2212 X\u03b8(cid:107)2\n\n1\n2n\n\n\u03bb(cid:107)\u03b8(cid:107)2\n\n2\n\n2 +\n\nL(\u03b8, D) =\n\nwhere X denotes the (n \u00d7 p) matrix with row vectors X1, . . . , Xn and y = (y1, . . . , yn). Since the\n\noptimal solution for the unconstrained problem has (cid:96)2 norm no more than(cid:112)1/\u03bb (see the full version\nfor a proof), we will focus on optimizing \u03b8 over the constrained set C = {a \u2208 Rp | (cid:107)a(cid:107)2 \u2264(cid:112)1/\u03bb},\n\nwhich will be useful for bounding the (cid:96)1 sensitivity of the empirical loss.\nBefore we formally introduce the covariance perturbation algorithm due to [19], observe that the\noptimal solution \u03b8\u2217 can be computed as\n\n2\n\n,\n\n\u03b8\u2217 = argmin\n\u03b8\u2208C\n\nL(\u03b8, D) = argmin\n\n(cid:124)\n\n(\u03b8\n\n(X\n\n(cid:124)\n\nX)\u03b8 \u2212 2(cid:104)X\n\n(cid:124)\n\ny, \u03b8(cid:105))\n\n\u03bb(cid:107)\u03b8(cid:107)2\n\n2\n\n.\n\n+\n\n\u03b8\u2208C\n\n(cid:124)\n\n2n\nIn other words, \u03b8\u2217 only depends on the private data through X\nX. To compute a private\nhypothesis, the covariance perturbation method simply adds Laplace noise to each entry of X\ny and\nX (the covariance matrix), and solves the optimization based on the noisy matrix and vector. The\nX\nformal description of the algorithm and its guarantee are in Theorem 3.1. Our analysis differs from\nthe one in [19] in that their paper considers the \u201clocal privacy\u201d setting, and also adds Gaussian noise\nwhereas we use Laplace. The proof is deferred to the full version.\nTheorem 3.1. Fix any \u03b5 > 0. For any input data set D, consider the mechanism M that computes\n\ny and X\n\n(cid:124)\n\n(cid:124)\n\n(cid:124)\n\n2\n\n\u03b8p = argmin\n\n\u03b8\u2208C\n\n1\n2n\n\n(cid:124)\n\n(\u03b8\n\n(X\n\n(cid:124)\n\nX + B)\u03b8 \u2212 2(cid:104)X\n\n(cid:124)\n\ny + b, \u03b8(cid:105)) +\n\n\u03bb(cid:107)\u03b8(cid:107)2\n\n2\n\n,\n\n2\n\nwhere B \u2208 Rp\u00d7p and b \u2208 Rp\u00d71 are random Laplace matrices such that each entry of B and b is\ndrawn from Lap (4/\u03b5). Then M satis\ufb01es \u03b5-differential privacy and the output \u03b8p satis\ufb01es\n\n\u221a\n[L(\u03b8p) \u2212 L(\u03b8\u2217)] \u2264 4\n\nE\nB,b\n\n2(2(cid:112)p/\u03bb + p/\u03bb)\n\n.\n\nn\u03b5\n\n(cid:124)\n\n(cid:124)\n\nIn our algorithm COVNR, we will apply the noise reduction method, Algorithm 1, to produce a\nsequence of noisy versions of the private data (X\ny): (Z 1, z1), . . . , (Z T , zT ), one for each\nprivacy level. Then for each (Z t, zt), we will compute the private hypothesis by solving the noisy\nversion of the optimization problem in Equation (1). The full description of our algorithm COVNR is\nin Algorithm 3, and satis\ufb01es the following guarantee:\nTheorem 3.2. The instantiation of COVNR(D,{\u03b51, . . . , \u03b5T}, \u03b1, \u03b3) outputs a hypothesis \u03b8p that with\nprobability 1 \u2212 \u03b3 satis\ufb01es L(\u03b8p) \u2212 L(\u03b8\u2217) \u2264 \u03b1. Moreover, it is E-ex-post differentially private, where\nthe privacy loss function E : (([T ] \u222a {\u22a5}) \u00d7 Rp) \u2192 (R\u22650 \u222a {\u221e}) is de\ufb01ned as E((k,\u00b7)) = \u03b50 + \u03b5k\nfor any k (cid:54)=\u22a5, E((\u22a5,\u00b7)) = \u221e, and\n\nX, X\n\n16((cid:112)1/\u03bb + 1)2 log(2T /\u03b3)\n\nn\u03b1\n\nis the privacy loss incurred by IAT.\n\n\u03b50 =\n\n3.2 Output Perturbation for Logistic Regression\n\nIn this setting,\n\nNext, we show how to combine the output perturbation method with noise reduction for the\nridge regression problem.5\nthe input data consists of n labeled examples\n(X1, y1), . . . , (Xn, yn), such that for each i, Xi \u2208 Rp, (cid:107)Xi(cid:107)1 \u2264 1, and yi \u2208 {\u22121, 1}. The goal is to\ntrain a linear classi\ufb01er given by a weight vector \u03b8 for the examples from the two classes. We consider\nthe logistic loss function: (cid:96)(\u03b8, (Xi, yi)) = log(1 + exp(\u2212yi\u03b8\n(cid:124)\nlog(1 + exp(\u2212yi\u03b8\n\nXi)), and the empirical loss is\n\nn(cid:88)\n\nL(\u03b8, D) =\n\n\u03bb(cid:107)\u03b8(cid:107)2\n\nXi)) +\n\n(cid:124)\n\n2\n\n.\n\n2\n\n1\nn\n\ni=1\n\n5We study the ridge regression problem for concreteness. Our method works for any ERM problem with\n\nstrongly convex loss functions.\n\n6\n\n\fAlgorithm 3 Covariance Perturbation with Noise-Reduction: COVNR(D,{\u03b51, . . . , \u03b5T}, \u03b1, \u03b3)\n\nInput: private data set D = (X, y), accuracy parameter \u03b1, privacy levels \u03b51 < \u03b52 < . . . < \u03b5T ,\nand failure probability \u03b3\nInstantiate InteractiveAboveThreshold: A = IAT(D, \u03b50,\u2212\u03b1/2, \u2206,\u00b7) with \u03b50 =\n\n16\u2206(log(2T /\u03b3))/\u03b1 and \u2206 = ((cid:112)1/\u03bb + 1)2/(n)\nLet C = {a \u2208 Rp | (cid:107)a(cid:107)2 \u2264(cid:112)1/\u03bb} and \u03b8\u2217 = argmin\u03b8\u2208C L(\u03b8)\n\nCompute noisy data:\n(cid:124)\n{Z t} = NR((X\nfor t = 1, . . . , T : do\n\nX), 2,{\u03b51/2, . . . , \u03b5T /2}),\n\n{zt} = NR((X\n\n(cid:124)\n\nY ), 2,{\u03b51/2, . . . , \u03b5T /2})\n\n(cid:0)\u03b8\n\n(cid:124)\n\nZ t\u03b8 \u2212 2(cid:104)zt, \u03b8(cid:105)(cid:1) +\n\n\u03bb(cid:107)\u03b8(cid:107)2\n\n2\n\n2\n\n\u03b8t = argmin\n\n\u03b8\u2208C\n\n1\n2n\n\n(1)\n\nLet f t(D) = L(\u03b8\u2217, D) \u2212 L(\u03b8t, D); Query A with query f t to check accuracy\nif A returns (t, f t) then Output (t, \u03b8t)\n\n(cid:46) Accurate hypothesis found.\n\nOutput: (\u22a5, \u03b8\u2217)\n\nThe output perturbation method simply adds Laplace noise to perturb each coordinate of the optimal\nsolution \u03b8\u2217. The following is the formal guarantee of output perturbation. Our analysis deviates\nslightly from the one in [5] since we are adding Laplace noise (see the full version).\n\u221a\nTheorem 3.3. Fix any \u03b5 > 0. Let r = 2\nn\u03bb\u03b5 . For any input dataset D, consider the mechanism that\n\ufb01rst computes \u03b8\u2217 = argmin\u03b8\u2208Rp L(\u03b8), then outputs \u03b8p = \u03b8\u2217 + b, where b is a random vector with its\nentries drawn i.i.d. from Lap (r). Then M satis\ufb01es \u03b5-differential privacy, and \u03b8p has excess risk\n\np\n\n\u221a\n[L(\u03b8p) \u2212 L(\u03b8\u2217)] \u2264 2\n\nE\nb\n\n2p\nn\u03bb\u03b5\n\n+\n\n4p2\nn2\u03bb\u03b52 .\n\nGiven the output perturbation method, we can simply apply the noise reduction method NR to the\noptimal hypothesis \u03b8\u2217 to generate a sequence of noisy hypotheses. We will again use Interactive-\nAboveThreshold to check the excess risk of the hypotheses. The full algorithm OUTPUTNR follows\nthe same structure in Algorithm 3, and we defer the formal description to the full version.\nTheorem 3.4. The instantiation of OUTPUTNR(D, \u03b50,{\u03b51, . . . , \u03b5T}, \u03b1, \u03b3) is E-ex-post differentially\nprivate and outputs a hypothesis \u03b8p that with probability 1 \u2212 \u03b3 satis\ufb01es L(\u03b8p) \u2212 L(\u03b8\u2217) \u2264 \u03b1, where\nthe privacy loss function E : (([T ] \u222a {\u22a5}) \u00d7 Rp) \u2192 (R\u22650 \u222a {\u221e}) is de\ufb01ned as E((k,\u00b7)) = \u03b50 + \u03b5k\nfor any k (cid:54)=\u22a5, E((\u22a5,\u00b7)) = \u221e, and\n\n\u03b50 \u2264 32 log(2T /\u03b3)(cid:112)2 log 2/\u03bb\n\nn\u03b1\n\nis the privacy loss incurred by IAT.\n\nProof sketch of Theorems 3.2 and 3.4. The accuracy guarantees for both algorithms follow from an\naccuracy guarantee of the IAT algorithm (a variant on the standard AboveThreshold bound) and\nthe fact that we output \u03b8\u2217 if IAT identi\ufb01es no accurate hypothesis. For the privacy guarantee, \ufb01rst\nnote that any pre\ufb01x of the noisy hypotheses \u03b81, . . . , \u03b8t satis\ufb01es \u03b5t-differential privacy because of\nour instantiation of the Laplace mechanism (see the full version for the (cid:96)1 sensitivity analysis) and\nnoise-reduction method NR. Then the ex-post privacy guarantee directly follows Lemma 2.8.\n\n4 Experiments\n\nTo evaluate the methods described above, we conducted empirical evaluations in two settings. We\nused ridge regression to predict (log) popularity of posts on Twitter in the dataset of [1], with p = 77\nfeatures and subsampled to n =100,000 data points. Logistic regression was applied to classifying\n\n7\n\n\f(a) Linear (ridge) regression,\n\nvs theory approach.\n\n(b) Regularized logistic regression,\n\nvs theory approach.\n\n(c) Linear (ridge) regression,\n\nvs DOUBLINGMETHOD.\n\n(d) Regularized logistic regression,\n\nvs DOUBLINGMETHOD.\n\nFigure 1: Ex-post privacy loss. (1a) and (1c), left, represent ridge regression on the Twitter dataset,\nwhere Noise Reduction and DOUBLINGMETHOD both use Covariance Perturbation. (1b) and (1d),\nright, represent logistic regression on the KDD-99 Cup dataset, where both Noise Reduction and\nDOUBLINGMETHOD use Output Perturbation. The top plots compare Noise Reduction to the \u201ctheory\napproach\u201d: running the algorithm once using the value of \u03b5 that guarantees the desired expected\nerror via a utility theorem. The bottom compares to the DOUBLINGMETHOD baseline. Note the top\nplots are generous to the theory approach: the theory curves promise only expected error, whereas\nNoise Reduction promises a high probability guarantee. Each point is an average of 80 trials (Twitter\ndataset) or 40 trials (KDD-99 dataset).\n\nnetwork events as innocent or malicious in the KDD-99 Cup dataset [13], with 38 features and\nsubsampled to 100,000 points. Details of parameters and methods appear in the full version.6\nIn each case, we tested the algorithm\u2019s average ex-post privacy loss for a range of input accuracy goals\n\u03b1, \ufb01xing a modest failure probability \u03b3 = 0.1 (and we observed that excess risks were concentrated\nwell below \u03b1/2, suggesting a pessimistic analysis). The results show our meta-method gives a\nlarge improvement over the \u201ctheory\u201d approach of simply inverting utility theorems for private ERM\nalgorithms. (In fact, the utility theorem for the popular private stochastic gradient descent algorithm\ndoes not even give meaningful guarantees for the ranges of parameters tested; one would need an\norder of magnitude more data points, and even then the privacy losses are enormous, perhaps due to\nloose constants in the analysis.)\nTo gauge the more modest improvement over DOUBLINGMETHOD, note that the variation in the\nprivacy risk factor e\u03b5 can still be very large; for instance, in the ridge regression setting of \u03b1 = 0.05,\n\n6 A full\n\nAccuracy-First-Differential-Privacy.\n\nimplementation of our algorithms appears at:\n\nhttps://github.com/steven7woo/\n\n8\n\n0.000.050.100.150.20Input \u03b1 (excess error guarantee)05101520ex-post privacy loss \u0001Comparison to theory approachCovarPert theoryOutputPert theoryNoiseReduction0.000.050.100.150.20Input \u03b1 (excess error guarantee)02468101214ex-post privacy loss \u0001Comparison to theory approachOutputPert theoryNoiseReduction0.000.050.100.150.20Input \u03b1 (excess error guarantee)0246810ex-post privacy loss \u0001Comparison to DoublingDoublingNoiseReduction0.000.050.100.150.20Input \u03b1 (excess error guarantee)0.00.51.01.52.02.53.03.5ex-post privacy loss \u0001Comparison to DoublingDoublingNoiseReduction\fNoise Reduction has e\u03b5 \u2248 10.0 while DOUBLINGMETHOD has e\u03b5 \u2248 495; at \u03b1 = 0.075, the privacy\nrisk factors are 4.65 and 56.6 respectively.\nInterestingly, for our meta-method, the contribution to privacy loss from \u201ctesting\u201d hypotheses (the\nInteractiveAboveThreshold technique) was signi\ufb01cantly larger than that from \u201cgenerating\u201d them\n(NoiseReduction). One place where the InteractiveAboveThreshold analysis is loose is in using a\ntheoretical bound on the maximum norm of any hypothesis to compute the sensitivity of queries.\nThe actual norms of hypotheses tested was signi\ufb01cantly lower which, if taken as guidance to the\npractitioner in advance, would drastically improve the privacy guarantee of both adaptive methods.\n\n5 Future Directions\n\nThroughout this paper, we focus on \u03b5-differential privacy, instead of the weaker (\u03b5, \u03b4)-(approximate)\ndifferential privacy. Part of the reason is that an analogue of Lemma 2.8 does not seem to hold for\n(\u03b5, \u03b4)-differentially private queries without further assumptions, as the necessity to union-bound over\nthe \u03b4 \u201cfailure probability\u201d that the privacy loss is bounded for each query can erase the ex-post gains.\nWe leave obtaining similar results for approximate differential privacy as an open problem. More\ngenerally, we wish to extend our ex-post privacy framework to approximate differential privacy, or\nto the stronger notion of concentrated differential privacy [3]. Such results will allow us to obtain\nex-post privacy guarantees for a much broader class of algorithms.\n\n9\n\n\fReferences\n[1] The AMA Team at Laboratoire d\u2019Informatique de Grenoble. Buzz prediction in online social\n\nmedia, 2017.\n\n[2] Raef Bassily, Adam D. Smith, and Abhradeep Thakurta. Private empirical risk minimization,\n\nrevisited. CoRR, abs/1405.7085, 2014.\n\n[3] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simpli\ufb01cations, extensions,\nand lower bounds. In Theory of Cryptography - 14th International Conference, TCC 2016-B,\nBeijing, China, October 31 - November 3, 2016, Proceedings, Part I, pages 635\u2013658, 2016.\n\n[4] Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In Advances\nin Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual\nConference on Neural Information Processing Systems, Vancouver, British Columbia, Canada,\nDecember 8-11, 2008, pages 289\u2013296, 2008.\n\n[5] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 12:1069\u20131109, 2011.\n\n[6] John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Local privacy and statistical\nminimax rates. In 51st Annual Allerton Conference on Communication, Control, and Computing,\nAllerton 2013, Allerton Park & Retreat Center, Monticello, IL, USA, October 2-4, 2013, page\n1592, 2013.\n\n[7] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\nsensitivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284.\nSpringer, 2006.\n\n[8] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Founda-\n\ntions and Trends R(cid:13) in Theoretical Computer Science, 9(3\u20134):211\u2013407, 2014.\n\n[9] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In\nFoundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 51\u201360.\nIEEE, 2010.\n\n[10] Giulia Fanti, Vasyl Pihur, and \u00dalfar Erlingsson. Building a rappor with the unknown: Privacy-\npreserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing\nTechnologies (PoPETS), issue 3, 2016, 2016.\n\n[11] Andy Greenberg. Apple\u2019s \u2019differential privacy\u2019 is about collecting your data\u2014but not your data.\n\nWired Magazine, 2016.\n\n[12] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning.\nIn COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh,\nScotland, pages 24.1\u201324.34, 2012.\n\n[13] KDD\u201999. Kdd cup 1999 data, 1999.\n\n[14] Daniel Kifer, Adam D. Smith, and Abhradeep Thakurta. Private convex optimization for\nempirical risk minimization with applications to high-dimensional regression. In COLT 2012\n- The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland,\npages 25.1\u201325.40, 2012.\n\n[15] Fragkiskos Koufogiannis, Shuo Han, and George J. Pappas. Gradual release of sensitive data\n\nunder differential privacy. Journal of Privacy and Con\ufb01dentiality, 7, 2017.\n\n[16] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundations\nof Computer Science, 2007. FOCS\u201907. 48th Annual IEEE Symposium on, pages 94\u2013103. IEEE,\n2007.\n\n[17] Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and\n\ufb01lters: Pay-as-you-go composition. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and\nR. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1921\u20131929.\nCurran Associates, Inc., 2016.\n\n10\n\n\f[18] Benjamin I. P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft. Learning in a large\nfunction space: Privacy-preserving mechanisms for SVM learning. CoRR, abs/0911.5708, 2009.\n\n[19] Adam Smith, Jalaj Upadhyay, and Abhradeep Thakurta. Is interaction necessary for distributed\n\nprivate learning? IEEE Symposium on Security and Privacy, 2017.\n\n[20] Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with\ndifferentially private updates. In IEEE Global Conference on Signal and Information Processing,\nGlobalSIP 2013, Austin, TX, USA, December 3-5, 2013, pages 245\u2013248, 2013.\n\n[21] Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy.\n\nIn\nAdvances in Neural Information Processing Systems 23: 24th Annual Conference on Neural\nInformation Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010,\nVancouver, British Columbia, Canada., pages 2451\u20132459, 2010.\n\n11\n\n\f", "award": [], "sourceid": 1486, "authors": [{"given_name": "Katrina", "family_name": "Ligett", "institution": "Hebrew University"}, {"given_name": "Seth", "family_name": "Neel", "institution": "University of Pennsylvania"}, {"given_name": "Aaron", "family_name": "Roth", "institution": "University of Pennsylvania"}, {"given_name": "Bo", "family_name": "Waggoner", "institution": null}, {"given_name": "Steven", "family_name": "Wu", "institution": "Microsoft"}]}