{"title": "Certified Defenses for Data Poisoning Attacks", "book": "Advances in Neural Information Processing Systems", "page_first": 3517, "page_last": 3529, "abstract": "Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.", "full_text": "Certi\ufb01ed Defenses for Data Poisoning Attacks\n\nJacob Steinhardt\u21e4\nStanford University\n\njsteinha@stanford.edu\n\nPang Wei Koh\u21e4\nStanford University\n\npangwei@cs.stanford.edu\n\nPercy Liang\n\nStanford University\n\npliang@cs.stanford.edu\n\nAbstract\n\nMachine learning systems trained on user-provided data are susceptible to data\npoisoning attacks, whereby malicious users inject false training data with the aim of\ncorrupting the learned model. While recent work has proposed a number of attacks\nand defenses, little is understood about the worst-case loss of a defense in the\nface of a determined attacker. We address this by constructing approximate upper\nbounds on the loss across a broad family of attacks, for defenders that \ufb01rst perform\noutlier removal followed by empirical risk minimization. Our approximation relies\non two assumptions: (1) that the dataset is large enough for statistical concentration\nbetween train and test error to hold, and (2) that outliers within the clean (non-\npoisoned) data do not have a strong effect on the model. Our bound comes paired\nwith a candidate attack that often nearly matches the upper bound, giving us a\npowerful tool for quickly assessing defenses on a given dataset. Empirically, we\n\ufb01nd that even under a simple defense, the MNIST-1-7 and Dog\ufb01sh datasets are\nresilient to attack, while in contrast the IMDB sentiment dataset can be driven from\n12% to 23% test error by adding only 3% poisoned data.\n\n1\n\nIntroduction\n\nTraditionally, computer security seeks to ensure a system\u2019s integrity against attackers by creating\nclear boundaries between the system and the outside world (Bishop, 2002). In machine learning,\nhowever, the most critical ingredient of all\u2013the training data\u2013comes directly from the outside world.\nFor a system trained on user data, an attacker can inject malicious data simply by creating a user\naccount. Such data poisoning attacks require us to re-think what it means for a system to be secure.\nThe focus of the present work is on data poisoning attacks against classi\ufb01cation algorithms, \ufb01rst\nstudied by Biggio et al. (2012) and later by a number of others (Xiao et al., 2012; 2015b; Newell\net al., 2014; Mei and Zhu, 2015b; Burkard and Lagesse, 2017; Koh and Liang, 2017). This body\nof work has demonstrated data poisoning attacks that can degrade classi\ufb01er accuracy, sometimes\ndramatically. Moreover, while some defenses have been proposed against speci\ufb01c attacks (Laishram\nand Phoha, 2016), few have been stress-tested against a determined attacker.\nAre there defenses that are robust to a large class of data poisoning attacks? At development time,\none could take a clean dataset and test a defense against a number of poisoning strategies on that\ndataset. However, because of the near-limitless space of possible attacks, it is impossible to conclude\nfrom empirical success alone that a defense that works against a known set of attacks will not fail\nagainst a new attack.\nIn this paper, we address this dif\ufb01culty by presenting a framework for studying the entire space\nof attacks against a given defense. Our framework applies to defenders that (i) remove outliers\nresiding outside a feasible set, then (ii) minimize a margin-based loss on the remaining data. For such\ndefenders, we can generate approximate upper bounds on the ef\ufb01cacy of any data poisoning attack,\nwhich hold modulo two assumptions\u2014that the empirical train and test distribution are close together,\n\n\u21e4Equal contribution.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fand that the outlier removal does not signi\ufb01cantly change the distribution of the clean (non-poisoned)\ndata; these assumptions are detailed more formally in Section 3. We then establish a duality result for\nour upper bound, and use this to generate a candidate attack that nearly matches the bound. Both the\nupper bound and attack are generated via an ef\ufb01cient online learning algorithm.\nWe consider two different instantiations of our framework: \ufb01rst, where the outlier detector is trained\nindependently and cannot be affected by the poisoned data, and second, where the data poisoning can\nattack the outlier detector as well. In both cases we analyze binary SVMs, although our framework\napplies in the multi-class case as well.\nIn the \ufb01rst setting, we apply our framework to an \u201coracle\u201d defense that knows the true class centroids\nand removes points that are far away from the centroid of the corresponding class. While previous\nwork showed successful attacks on the MNIST-1-7 (Biggio et al., 2012) and Dog\ufb01sh (Koh and\nLiang, 2017) image datasets in the absence of any defenses, we show (Section 4) that no attack can\nsubstantially increase test error against this oracle\u2014the 0/1-error of an SVM on either dataset is at\nmost 4% against any of the attacks we consider, even after adding 30% poisoned data.1 Moreover,\nwe provide certi\ufb01ed upper bounds of 7% and 10% test error, respectively, on the two datasets. On\nthe other hand, on the IMDB sentiment corpus (Maas et al., 2011) our attack increases classi\ufb01cation\ntest error from 12% to 23% with only 3% poisoned data, showing that defensibility is very dataset-\ndependent: the high dimensionality and abundance of irrelevant features in the IMDB corpus give the\nattacker more room to construct attacks that evade outlier removal.\nFor the second setting, we consider a more realistic defender that uses the empirical (poisoned)\ncentroids. For small amounts of poisoned data (\uf8ff 5%) we can still certify the resilience of MNIST-\n1-7 and Dog\ufb01sh (Section 5). However, with more (30%) poisoned data, the attacker can subvert\nthe outlier removal to obtain stronger attacks, increasing test error on MNIST-1-7 to 40%\u2014much\nhigher than the upper bound of 7% for the oracle defense.\nIn other words, defenses that rely\non the (potentially poisoned) data can be much weaker than their data-independent counterparts,\nunderscoring the need for outlier removal mechanisms that are themselves robust to attack.\n\n2 Problem Setting\n\nConsider a prediction task from an input x 2X (e.g., Rd) to an output y 2Y ; in our case we\nwill take Y = {1, +1} (binary classi\ufb01cation) although most of our analysis holds for arbitrary\nY. Let ` be a non-negative convex loss function: e.g., for linear classi\ufb01cation with the hinge loss,\n`(\u2713; x, y) = max(0, 1 yh\u2713, xi) for a model \u2713 2 \u21e5 \u2713 Rd and data point (x, y). Given a true\ndata-generating distribution p\u21e4 over X\u21e5Y , de\ufb01ne the test loss as L(\u2713) = E(x,y)\u21e0p\u21e4[`(\u2713; x, y)].\nWe consider the causative attack model (Barreno et al., 2010), which consists of a game between two\nplayers: the defender (who seeks to learn a model \u2713), and the attacker (who wants the learner to learn\na bad model). The game proceeds as follows:\n\u2022 n data points are drawn from p\u21e4 to produce a clean training dataset Dc.\n\u2022 The attacker adaptively chooses a \u201cpoisoned\u201d dataset Dp of \u270fn poisoned points, where \u270f 2 [0, 1]\nparametrizes the attacker\u2019s resources.\n\u2022 The defender trains on the full dataset Dc [D p to produce a model \u02c6\u2713, and incurs test loss L(\u02c6\u2713).\nThe defender\u2019s goal is to minimize the quantity L(\u02c6\u2713) while the attacker\u2019s goal is to maximize it.\nRemarks. We assume the attacker has full knowledge of the defender\u2019s algorithm and of the clean\ntraining data Dc. While this may seem generous to the attacker, it is widely considered poor practice\nto rely on secrecy for security (Kerckhoffs, 1883; Biggio et al., 2014a); moreover, a determined\nattacker can often reverse-engineer necessary system details (Tram\u00e8r et al., 2016).\nThe causative attack model allows the attacker to add points but not modify existing ones. Indeed,\nsystems constantly collect new data (e.g., product reviews, user feedback on social media, or insurance\nclaims), whereas modi\ufb01cation of existing data would require \ufb01rst compromising the system.\nAttacks that attempt to increase the overall test loss L(\u02c6\u2713), known as indiscriminate availability attacks\n(Barreno et al., 2010), can be thought of as a denial-of-service attack. This is in contrast to targeted\n\n1We note Koh and Liang\u2019s attack on Dog\ufb01sh targets speci\ufb01c test images rather than overall test error.\n\n2\n\n\fFigure 1: Different datasets possess very different levels of vulnerability to attack. Here, we visualize the effect\nof the sphere and slab oracle defenses, with thresholds chosen to match the 70th percentile of the clean data.\nWe mark with an X our attacks for the respective values of \u270f. (a) For the MNIST-1-7 dataset, the classes are\nwell-separated and no attack can get past the defense. Note that our attack chooses to put all of its weight on the\nnegative class here, although this need not be true in general. (b) For the IMDB dataset, the class centroids are\nnot well-separated and it is easy to attack the classi\ufb01er. See Section 4 for more details about the experiments.\n\nattacks on individual examples or sub-populations (e.g., Burkard and Lagesse, 2017). Both have\nserious security implications, but we focus on denial-of-service attacks, as they compromise the\nmodel in a broad sense and interfere with fundamental statistical properties of learning algorithms.\n\n2.1 Data Sanitization Defenses\nA defender who trains na\u00efvely on the full (clean + poisoned) data Dc [D p is doomed to failure, as\neven a single poisoned point can in some cases arbitrarily change the model (Liu and Zhu, 2016; Park\net al., 2017). In this paper, we consider data sanitization defenses (Cretu et al., 2008), which examine\nthe full dataset and try to remove the poisoned points, for example by deleting outliers. Formally, the\ndefender constructs a feasible set F\u2713X\u21e5Y\n\n`(\u2713; x, y).\n\n(1)\n\n\u02c6\u2713 def= argmin\n\n\u27132\u21e5\n\nand trains only on points in F:\nL(\u2713; (Dc [D p) \\F ), where L(\u2713; S) def= X(x,y)2S\n\nGiven such a defense F, we would like to upper bound the worst possible test loss over any attacker\n(choice of Dp)\u2014in symbols, maxDp L(\u02c6\u2713). Such a bound would certify that the defender incurs at\nmost some loss no matter what the attacker does. We consider two classes of defenses:\n\u2022 Fixed defenses, where F does not depend on Dp. One example for text classi\ufb01cation is letting F\nbe documents that contain only licensed words (Newell et al., 2014). Other examples are oracle\ndefenders that depend on the true distribution p\u21e4. While such defenders are not implementable\nin practice, they provide bounds: if even an oracle can be attacked, then we should be worried.\n\u2022 Data-dependent defenses, where F depends on Dc [D p. These defenders try to estimate p\u21e4\nfrom Dc [D p and thus are implementable in practice. However, they open up a new line of\nattack wherein the attacker chooses the poisoned data Dp to change the feasible set F.\n\ndef= E[x | y = +1] and \u00b5\n\ndef= E[x | y = 1]\nExample defenses for binary classi\ufb01cation. Let \u00b5+\nbe the centroids of the positive and negative classes. A natural defense strategy is to remove points\nthat are too far away from the corresponding centroid. We consider two ways of doing this: the sphere\ndefense, which removes points outside a spherical radius, and the slab defense, which \ufb01rst projects\npoints onto the line between the centroids and then discards points that are too far on this line:\ndef= {(x, y) : |hx \u00b5y, \u00b5y \u00b5yi| \uf8ff sy}.\n\n(2)\nHere ry, sy are thresholds (e.g., chosen so that 30% of the data is removed). Note that both defenses\nare oracles (\u00b5y depends on p\u21e4); in Section 5, we consider versions that estimate \u00b5 from Dc [D p.\nFigure 1 depicts both defenses on the MNIST-1-7 and IMDB datasets. Intuitively, the constraints on\nMNIST-1-7 make it dif\ufb01cult for an attacker, whereas IMDB looks far more attackable. In the next\nsection, we will see how to make these intuitions concrete.\n\nFsphere\n\ndef= {(x, y) : kx \u00b5yk2 \uf8ff ry}, Fslab\n\n3\n\n\fAlgorithm 1 Online learning algorithm for generating an upper bound and candidate attack.\nInput: clean data Dc of size n, feasible set F, radius \u21e2, poisoned fraction \u270f, step size \u2318.\nInitialize z(0) 0, (0) 1\nfor t = 1, . . . ,\u270fn do\nCompute (x(t), y(t)) = argmax(x,y)2F `(\u2713(t1); x, y).\nU\u21e4 minU\u21e4, 1\ng(t) 1\nUpdate: z(t) z(t1) g(t),\n\nn L(\u2713(t1);Dc) + \u270f`(\u2713(t1); x(t), y(t)).\n(t) max((t1), kz(t)k2\n\nnrL(\u2713(t1);Dc) + \u270fr`(\u2713(t1); x(t), y(t)).\n\n\u2318 , \u2713(0) 0, U\u21e4 1.\n\n(t) z(t)\n(t) .\n\nend for\nOutput: upper bound U\u21e4 and candidate attack Dp = {(x(t), y(t))}\u270fn\nt=1.\n\n),\u2713\n\n\u21e2\n\n3 Attack, Defense, and Duality\n\nRecall that we are interested in the worst-case test loss maxDp L(\u02c6\u2713). To make progress, we consider\nthree approximations. First, (i) we pass from the test loss to the training loss on the clean data, and\n(ii) we consider the training loss on the full (clean + poisoned) data, which upper bounds the loss on\nthe clean data due to non-negativity of the loss. For any model \u2713, we then have:\n\nL(\u2713)\n\n(i)\n\n\u21e1\n\n1\nn\n\nL(\u2713;Dc)\n\n(ii)\n\n\uf8ff\n\n1\nn\n\nL(\u2713;Dc [D p).\n\n(3)\n\nThe approximation (i) could potentially be invalid due to over\ufb01tting; however, if we regularize\nthe model appropriately then we can show that train and test are close by standard concentration\narguments (see Appendix B for details). Note that (ii) is always a valid upper bound, and will be\nrelatively tight as long as the model ends up \ufb01tting the poisoned data well.\nFor our \ufb01nal approximation, we (iii) have the defender train on Dc [ (Dp \\F ) (i.e., it uses the entire\nclean data set Dc rather than just the inliers Dc \\F ). This should not have a large effect as long as\nthe defense is not too aggressive (i.e., as long as F is not so small that it would remove important\npoints from the clean data Dc). We denote the resulting model as \u02dc\u2713 to distinguish it from \u02c6\u2713.\nPutting it all together, the worst-case test loss from any attack Dp with \u270fn elements is approximately\nupper bounded as follows:\n\nL(\u02c6\u2713)\n\nmax\nDp\n\n(i)\n\n\u21e1 max\nDp\n\n1\nn\n\nL(\u02c6\u2713;Dc)\n\n(ii)\n\n(iii)\n\n\uf8ff max\nDp\n\u21e1 max\nDp\n= max\nDp\u2713F\n\n1\nL(\u02c6\u2713;Dc [ (Dp \\F ))\nn\n1\nL(\u02dc\u2713;Dc [ (Dp \\F ))\nn\nmin\n\u27132\u21e5\n\n1\nn\n\nL(\u2713;Dc [D p) def= M.\n\n(4)\n\nHere the \ufb01nal step is because \u02dc\u2713 is chosen to minimize L(\u2713;Dc [ (Dp \\F )). The minimax loss M\nde\ufb01ned in (4) is the central quantity that we will focus on in the sequel; it has duality properties that\nwill yield insight into the nature of the optimal attack. Intuitively, the attacker that achieves M is\ntrying to maximize the loss on the full dataset by adding poisoned points from the feasible set F.\nThe approximations (i) and (iii) de\ufb01ne the assumptions we need for our certi\ufb01cates to hold; as long as\nboth approximations are valid, M will give an approximate upper bound on the worst-case test loss.\n\n3.1 Fixed Defenses: Computing the Minimax Loss via Online Learning\nWe now focus on computing the minimax loss M in (4) when F is not affected by Dp (\ufb01xed defenses).\nIn the process of computing M, we will also produce candidate attacks. Our algorithm is based on\nno-regret online learning, which models a game between a learner and nature and thus is a natural \ufb01t\nto our data poisoning setting. For simplicity of exposition we assume \u21e5 is an `2-ball of radius \u21e2.\nOur algorithm, shown in Algorithm 1, is very simple: in each iteration, it alternates between \ufb01nding\nthe worst attack point (x(t), y(t)) with respect to the current model \u2713(t1) and updating the model in\nthe direction of the attack point, producing \u2713(t). The attack Dp is the set of points thus found.\n\n4\n\n\f1\nn\n\n1\nn\n\nmax\nDp\u2713F\n\nU (\u2713), where U (\u2713) def=\n\nL(\u2713;Dc) + \u270f max\n(x,y)2F\n\nL(\u2713;Dc [D p) = min\n\u27132\u21e5\n\nTo derive the algorithm, we simply swap min and max in (4) to get an upper bound on M, after which\nthe optimal attack set Dp \u2713F for a \ufb01xed \u2713 is realized by a single point (x, y) 2F :\nM \uf8ff min\n\u27132\u21e5\n\n`(\u2713; x, y).\n(5)\nNote that U (\u2713) upper bounds M for any model \u2713. Algorithm 1 follows the natural strategy of\nminimizing U (\u2713) to iteratively tighten this upper bound. In the process, the iterates {(x(t), y(t))}\nn L(\u02dc\u2713;Dc [D p) is a lower bound on M. We can\nform a candidate attack Dp whose induced loss 1\nmonitor the duality gap between lower and upper bounds on M to ascertain the quality of the bounds.\nMoreover, since the loss ` is convex in \u2713, U (\u2713) is convex in \u2713 (regardless of the structure of F, which\ncould even be discrete). In this case, if we minimize U (\u2713) using any online learning algorithm with\nsublinear regret, the duality gap vanishes for large datasets. In particular (proof in Appendix A):\nProposition 1. Assume the loss ` is convex. Suppose that an online learning algorithm (e.g.,\nAlgorithm 1) is used to minimize U (\u2713), and that the parameters (x(t), y(t)) maximize the loss\n`(\u2713(t1); x, y) for the iterates \u2713(t1) of the online learning algorithm. Let U\u21e4 = min\u270fn\nt=1 U (\u2713(t)).\nAlso suppose that the learning algorithm has regret Regret(T ) after T time steps. Then, for the\nattack Dp = {(x(t), y(t))}\u270fn\n\nt=1, the corresponding parameter \u02dc\u2713 satis\ufb01es:\n\nL(\u02dc\u2713;Dc [D p) \uf8ff M \uf8ff U\u21e4\n\n1\nand U\u21e4 \n(6)\nn\nHence, any algorithm whose average regret Regret(\u270fn)\nis small will have a nearly optimal candidate\nattack Dp. There are many algorithms that have this property (Shalev-Shwartz, 2011); the particular\nalgorithm depicted in Algorithm 1 is a variant of regularized dual averaging (Xiao, 2010).\nIn\nsummary, we have a simple learning algorithm that computes an upper bound on the minimax loss\nalong with a candidate attack (which provides a lower bound). Of course, the minimax loss M is\nonly an approximation to the true worst-case test loss (via (4)). We examine the tightness of this\napproximation empirically in Section 4.\n\nL(\u02dc\u2713;Dc [D p) \uf8ff\n\nRegret(\u270fn)\n\n1\nn\n\n\u270fn\n\n.\n\n\u270fn\n\nFslab(Dp) def= {(x, y) : |hx \u02c6\u00b5y(Dp), \u02c6\u00b5y(Dp) \u02c6\u00b5y(Dp)i| \uf8ff sy},\n\n3.2 Data-Dependent Defenses: Upper and Lower Bounds\nWe now turn our attention to data-dependent defenders, where the feasible set F depends on the data\nDc [D p (and hence can be in\ufb02uenced by the attacker). For example, consider the slab defense (see\n(2)) that uses the empirical (poisoned) mean instead of the true mean:\n(7)\nwhere \u02c6\u00b5y(Dp) is the empirical mean over Dc [D p; the notation F(Dp) tracks the dependence of the\nfeasible set on Dp. Similarly to Section 3.1, we analyze the minimax loss M, which we can bound as\nin (5): M \uf8ff min\u27132\u21e5 maxDp\u2713F(Dp)\nHowever, unlike in (5), it is no longer the case that the optimal Dp places all points at a single location,\ndue to the dependence of F on Dp; we must jointly maximize over the full set Dp. To improve\ntractability, we take a continuous relaxation: we think of Dp as a probability distribution with mass\n\u270fn on each point in Dp, and relax this to allow any probability distribution \u21e1p. The constraint then\nbecomes supp(\u21e1p) \u2713F (Dp) (where supp denotes the support), and the analogue to (5) is\nE\u21e1p[`(\u2713; x, y)].\n\nn L(\u2713;Dc [D p).\n\n\u02dcU (\u2713), where \u02dcU (\u2713) def=\n\nmax\n\n(8)\n\n1\n\n1\n\n1\nn\n\nL(\u2713;Dc) + \u270f\n\nM \uf8ff min\n\u27132\u21e5\n\nsupp(\u21e1p)\u2713F(\u21e1p)\n\nThis suggests again employing Algorithm 1 to minimize \u02dcU (\u2713). Indeed, this is what we shall do, but\nthere are a few caveats:\n\u2022 The maximization problem in the de\ufb01nition of \u02dcU (\u2713) is in general quite dif\ufb01cult. We will, however,\nsolve a speci\ufb01c instance in Section 5 based on the sphere/slab defense described in Section 2.1.\n\u2022 The constraint set for \u21e1p is non-convex, so duality (Proposition 1) no longer holds. In particular,\nTo partially address the second issue, we will run Algorithm 1, at each iteration obtaining a distribution\n\u21e1(t)\np we will generate a candidate attack by sampling\np\n\u270fn points from \u21e1(t)\np , and take the best resulting attack. In Section 4 we will see that despite a lack of\nrigorous theoretical guarantees, this often leads to good upper bounds and attacks in practice.\n\nthe average of two feasible \u21e1p might not itself be feasible.\n\nand upper bound \u02dcU (\u2713(t)). Then, for each \u21e1(t)\n\n5\n\n\fFigure 2: On the (a) Dog\ufb01sh and (b) MNIST-1-7 datasets, our candidate attack (solid blue) achieves the upper\nbound (dashed blue) on the worst-case train loss, as guaranteed by Proposition 1. Moreover, this worst-case loss\nis low; even after adding 30% poisoned data, the loss stays below 0.1. (c) The gradient descent (dash-dotted)\nand label \ufb02ip (dotted) baseline attacks are suboptimal under this defense, with test loss (red) as well as test error\nand train loss (not shown) all signi\ufb01cantly worse than our candidate attack.\n\n4 Experiments I: Oracle Defenses\n\nAn advantage of our framework is that we obtain a tool that can be easily run on new datasets and\ndefenses to learn about the robustness of the defense and gain insight into potential attacks. We \ufb01rst\nstudy two image datasets: MNIST-1-7, and the Dog\ufb01sh dataset used by Koh and Liang (2017). For\nMNIST-1-7, following Biggio et al. (2012), we considered binary classi\ufb01cation between the digits\n1 and 7; this left us with n = 13007 training examples of dimension 784. For Dog\ufb01sh, which is a\nbinary classi\ufb01cation task, we used the same Inception-v3 features as in Koh and Liang (2017), so\nthat each of the n = 1800 training images is represented by a 2048-dimensional vector. For this and\nsubsequent experiments, our loss ` is the hinge loss (i.e., we train an SVM).\nWe consider the combined oracle slab and sphere defense from Section 2.1: F = Fslab \\F sphere. To\nrun Algorithm 1, we need to maximize the loss over (x, y) 2F . Note that maximizing the hinge\nloss `(\u2713; x, y) is equivalent to minimizing yh\u2713, xi. Therefore, we can solve the following quadratic\nprogram (QP) for each y 2{ +1,1} and take the one with higher loss:\n\nminimizex2Rd yh\u2713, xi\n\nsubject to kx \u00b5yk2\n\n2 \uf8ff r2\ny,\n\n|hx \u00b5y, \u00b5y \u00b5yi| \uf8ff sy.\n\n(9)\n\nThe results of Algorithm 1 are given in Figures 2a and 2b; here and elsewhere, we used a combination\nof CVXPY (Diamond and Boyd, 2016), YALMIP (L\u00f6fberg, 2004), SeDuMi (Sturm, 1999), and\nGurobi (Gurobi Optimization, Inc., 2016) to solve the optimization. We plot the upper bound U\u21e4\ncomputed by Algorithm 1, as well as the train and test loss induced by the corresponding attack Dp.\nExcept for small \u270f, the model \u02dc\u2713 \ufb01ts the poisoned data almost perfectly. We think this is because all\nfeasible attack points that can get past the defense can be easily \ufb01t without sacri\ufb01cing the quality of\nthe rest of the model; in particular, the model chooses to \ufb01t the attack points as soon as \u270f is large\nenough that there is incentive to do so.\nThe upshot is that, in this case, the loss L(\u02dc\u2713;Dc) on the clean data nearly matches its upper bound\nL(\u02dc\u2713;Dc [D p) (which in turn matches U\u21e4). On both datasets, the certi\ufb01ed upper bound U\u21e4 is small\n(< 0.1 with \u270f = 0.3), showing that the datasets are resilient to attack under the oracle defense.\nWe also ran the candidate attack from Algorithm 1 as well as two baselines \u2014 gradient descent on the\ntest loss (varying the location of points in Dp, as in Biggio et al. (2012) and Mei and Zhu (2015b)),\nand a simple baseline that inserts copies of points from Dc with the opposite label (subject to the\n\ufb02ipped points lying in F). The results are in in Figure 2c. Our attack consistently performs strongest;\nlabel \ufb02ipping seems to be too weak, while the gradient algorithm seems to get stuck in local minima.2\nThough it is not shown in the \ufb01gure, we note that the maximum test 0-1 error against any attack, for \u270f\nup to 0.3, was 4%, con\ufb01rming the robustness suggested by our certi\ufb01cates.\nFinally, we visualize our attack in Figure 1a. Interestingly, though the attack was free to place points\nanywhere, most of the attack is tightly concentrated around a single point at the boundary of F.\n\n2Though Mei and Zhu (2015b) state that their cost is convex, they communicated to us that this is incorrect.\n\n6\n\n\fFigure 3: The (a) Enron and (b) IMDB text datasets are signi\ufb01cantly easier to attack under the oracle sphere and\nslab defense than the image datasets from Figure 2. (c) In particular, our attack achieves a large increase in test\nloss (solid red) and test error (solid purple) with small \u270f for IMDB. The label \ufb02ip baseline was unsuccessful as\nbefore, and the gradient baseline does not apply to discrete data. In (a) and (b), note the large gap between upper\nand lower bounds, resulting from the upper bound relaxation and the IQP/randomized rounding approximations.\n\n4.1 Text Data: Handling Integrity Constraints\n\nWe next consider attacks on text data. Beyond the the sphere and slab constraints, a valid attack\non text data must satisfy additional integrity constraints (Newell et al., 2014): for text, the input x\nconsists of binary indicator features (e.g., presence of the word \u201cbanana\u201d) rather than arbitrary reals.3\nAlgorithm 1 still applies in this case \u2014 the only difference is that the QP from Section 4 has the\n0 and hence becomes an integer quadratic program (IQP), which can be\nadded constraint x 2 Zd\ncomputationally expensive to solve. We can still obtain upper bounds simply by relaxing the integrity\nconstraints; the only issue is that the points x(t) in the corresponding attack will have continuous\nvalues, and hence don\u2019t correspond to actual text inputs. To address this, we use the IQP solver from\nGurobi (Gurobi Optimization, Inc., 2016) to \ufb01nd an approximately optimal feasible x. This yields a\nvalid candidate attack, but it might not be optimal if the solver doesn\u2019t \ufb01nd near-optimal solutions.\nWe ran both the upper bound relaxation and the IQP solver on two text datasets, the Enron spam\ncorpus (Metsis et al., 2006) and the IMDB sentiment corpus (Maas et al., 2011). The Enron training\nset consists of n = 4137 e-mails (30% spam and 70% non-spam), with d = 5166 distinct words.\nThe IMDB training set consists of n = 25000 product reviews with d = 89527 distinct words. We\nused bag-of-words features, which yields test accuracy 97% and 88%, respectively, in the absence of\npoisoned data. IMDB was too large for Gurobi to even approximately solve the IQP, so we resorted\nto a randomized rounding heuristic to convert the continuous relaxation to an integer solution.\nResults are given in Figure 3; there is a relatively large gap between the upper bound and the attack.\nDespite this, the attacks are relatively successful. Most striking is the attack on IMDB, which\nincreases test error from 12% to 23% for \u270f = 0.03, despite having to pass the oracle defender.\nTo understand why the attacks are so much more successful in this case, we can consult Figure 1b. In\ncontrast to MNIST-1-7, for IMDB the defenses place few constraints on the attacker. This seems to\nbe a consequence of the high dimensionality of IMDB and the large number of irrelevant features,\nwhich increase the size of F without a corresponding increase in separation between the classes.\n5 Experiments II: Data-Dependent Defenses\n\nWe now revisit the MNIST-1-7 and Dog\ufb01sh datasets. Before, we saw that they were unattackable\nprovided we had an oracle defender that knew the true class means.\nIf we instead consider a\ndata-dependent defender that uses the empirical (poisoned) means, how much can this change the\nattackability of these datasets? In this section, we will see that the answer is quite a lot.\nAs described in Section 3.2, we can still use our framework to obtain upper and lower bounds\neven in this data-dependent case, although the bounds won\u2019t necessarily match. The main dif\ufb01culty\nis in computing \u02dcU (\u2713), which involves a potentially intractable maximization (see (8)). However,\nfor 2-class SVMs there is a tractable semide\ufb01nite programming algorithm; the full details are in\n\n3Note that in the previous section, we ignored such integrity constraints for simplicity.\n\n7\n\n\fFigure 4: The data-dependent sphere and slab defense is signi\ufb01cantly weaker than its oracle counterpart, allowing\nMNIST-1-7 and Dog\ufb01sh to be successfully attacked. (a) On MNIST-1-7, our attack achieves a test loss of 0.69\n(red) and error of 0.40 (not shown) at \u270f = 0.3, more than 10\u21e5 its oracle counterpart (gold). At low \u270f \uf8ff 0.05, the\ndataset is safe, with a max train loss of 0.12. We saw qualitatively similar results on Dog\ufb01sh. (b) Data-dependent\nsanitization can be signi\ufb01cantly poisoned by coordinated adversarial data. We show here our attack for \u270f = 0.3,\nwhich places almost all of its attacking mass on the red X. This shifts the empirical centroid, rotating the slab\nconstraint (from red to orange) and allowing the red X to be placed far on the other side of the blue centroid.\nAppendix D, but the rough idea is the following: we can show that the optimal distribution \u21e1p in\n(8) is supported on at most 4 points (one support vector and one non-support vector in each class).\nMoreover, for a \ufb01xed \u21e1p, the constraints and objective depend only on inner products between a small\nnumber of points: the 4 attack points, the class means \u00b5 (on the clean data), and the model \u2713. Thus,\nwe can solve for the optimal attack locations with a semide\ufb01nite program on a 7 \u21e5 7 matrix. Then\nin an outer loop, we randomly sample \u21e1p from the probability simplex and take the one with the\nhighest loss. Running this algorithm on MNIST-1-7 yields the results in Figure 4a. On the test set,\nour \u270f = 0.3 attack leads to a hinge loss of 0.69 (up from 0.03) and a 0-1 loss of 0.40 (up from 0.01).\nSimilarly, on Dog\ufb01sh, our \u270f = 0.3 attack gives a hinge loss of 0.59 (up from 0.05) and a 0-1 loss of\n0.22 (up from 0.01).\nThe geometry of the attack is depicted in Figure 4b. By carefully choosing the location of the attack,\nthe attacker can place points that lie substantially outside the original (clean) feasible set. This is\nbecause the poisoned data can substantially change the the direction of the slab constraint, while the\nsphere constraint by itself is not enough to effectively \ufb01lter out attacks. There thus appears to be\nsigni\ufb01cant danger in employing data-dependent defenders\u2014beyond the greater dif\ufb01culty of analyzing\nthem, they seem to actually be more vulnerable to attack.\n\n6 Related Work\n\nDue to their increased use in security-critical settings such as malware detection, there has been an\nexplosion of work on the security of machine learning systems; see Barreno et al. (2010), Biggio\net al. (2014a), Papernot et al. (2016b), and Gardiner and Nagaraja (2016) for some recent surveys.\nOur contribution relates to the long line of work on data poisoning attacks; beyond linear classi\ufb01ers,\nothers have studied the LASSO (Xiao et al., 2015a), clustering (Biggio et al., 2013; 2014c), PCA\n(Rubinstein et al., 2009), topic modeling (Mei and Zhu, 2015a), collaborative \ufb01ltering (Li et al., 2016),\nneural networks (Yang et al., 2017), and other models (Mozaffari-Kermani et al., 2015; Vuurens et al.,\n2011; Wang, 2016). There have also been a number of demonstrated vulnerabilities in deployed\nsystems (Newsome et al., 2006; Laskov and \u0160rndi`c, 2014; Biggio et al., 2014b). We provide formal\nscaffolding to this line of work by supplying a tool that can certify defenses against a range of attacks.\nA striking recent security vulnerability discovered in machine learning systems is adversarial test\nimages that can fool image classi\ufb01ers despite being imperceptible from normal images (Szegedy\net al., 2014; Goodfellow et al., 2015; Carlini et al., 2016; Kurakin et al., 2016; Papernot et al., 2016a).\nThese images exhibit vulnerabilities at test time, whereas data poisoning is a vulnerability at training\ntime. However, recent adversarial attacks on reinforcement learners (Huang et al., 2017; Behzadan\nand Munir, 2017; Lin et al., 2017) do blend train and test vulnerabilities. A common defense against\nadversarial test examples is adversarial training (Goodfellow et al., 2015), which alters the training\nobjective to encourage robustness.\n\n8\n\n\fWe note that generative adversarial networks (Goodfellow et al., 2014), despite their name, are not\nfocused on security but rather provide a game-theoretic objective for training generative models.\nFinally, a number of authors have studied the theoretical question of learning in the presence\nof adversarial errors, under a priori distributional assumptions on the data. Robust algorithms\nhave been exhibited for mean and covariance estimation and clustering (Diakonikolas et al., 2016;\nLai et al., 2016; Charikar et al., 2017), classi\ufb01cation (Klivans et al., 2009; Awasthi et al., 2014),\nregression (Nasrabadi et al., 2011; Nguyen and Tran, 2013; Chen et al., 2013; Bhatia et al., 2015)\nand crowdsourced data aggregation (Steinhardt et al., 2016). However, these bounds only hold for\nspeci\ufb01c (sometimes quite sophisticated) algorithms and are focused on good asymptotic performance,\nrather than on giving good numerical error guarantees for concrete datasets/defenses.\n\n7 Discussion\n\nIn this paper we have presented a tool for studying data poisoning defenses that goes beyond empirical\nvalidation by providing certi\ufb01cates against a large family of attacks modulo the approximations from\nSection 3. We stress that our bounds are meant to be used as a way to assess defense strategies\nin the design stage, rather than guaranteeing performance of a deployed learning algorithm (since\nour method needs to be run on the clean data, which we presumably would not have access to at\ndeployment time). For instance, if we want to build robust defenses for image classi\ufb01ers, we can\nassess the performance against attacks on a number of known image datasets, in order to gain more\ncon\ufb01dence in the robustness of the system that we actually deploy.\nHaving applied our framework to binary SVMs, there are a number of extensions we can consider:\ne.g., to other loss functions or to multiclass classi\ufb01cation. We can also consider defenses beyond the\nsphere and slab constraints considered here\u2014for instance, sanitizing text data using a language model,\nor using the covariance structure of the clean data (Lakhina et al., 2004). The main requirement of our\nframework is the ability to ef\ufb01ciently maximize `(\u2713; x, y) over all feasible x and y. For margin-based\nclassi\ufb01ers such as SVMs and logistic regression, this only requires maximizing a linear function over\nthe feasible set, which is often possible (e.g., via dynamic programming) even for discrete sets.\nOur framework currently does not handle non-convex losses: while our method might still be\nmeaningful as a way of generating attacks, our upper bounds would no longer be valid. The issue is\nthat an attacker could try to thwart the optimization process and cause the defender to end up in a bad\nlocal minimum. Finding ways to rule this out without relying on convexity would be quite interesting.\nSeparately, the bound L(\u02c6\u2713) / M was useful because M admits the natural minimax formulation\n(5), but the worst-case L(\u02c6\u2713) can be expressed directly as a bilevel optimization problem (Mei and\nZhu, 2015b), which is intractable in general but admits a number of heuristics (Bard, 1999). Bilevel\noptimization has been considered in the related setting of Stackelberg games (Br\u00fcckner and Scheffer,\n2011; Br\u00fcckner et al., 2012; Zhou and Kantarcioglu, 2016), and is natural to apply here as well.\nTo conclude, we quote Biggio et al., who call for the following methodology for evaluating defenses:\n\nTo pursue security in the context of an arms race it is not suf\ufb01cient to react to observed\nattacks, but it is also necessary to proactively anticipate the adversary by predicting the most\nrelevant, potential attacks through a what-if analysis; this allows one to develop suitable coun-\ntermeasures before the attack actually occurs, according to the principle of security by design.\n\nThe existing paradigm for such proactive anticipation is to design various hypothetical attacks against\nwhich to test the defenses. However, such an evaluation is fundamentally limited because it leaves\nopen the possibility that there is a more clever attack that we failed to think of. Our approach provides\na \ufb01rst step towards surpassing this limitation, by not just anticipating but certifying the reliability of a\ndefender, thus implicitly considering an in\ufb01nite number of attacks before they occur.\nReproducibility. The code and data for replicating our experiments is available on GitHub (http:\n//bit.ly/gt-datapois) and Codalab Worksheets (http://bit.ly/cl-datapois).\nAcknowledgments. JS was supported by a Fannie & John Hertz Foundation Fellowship and an NSF\nGraduate Research Fellowship. This work was also partially supported by a Future of Life Institute\ngrant and a grant from the Open Philanthropy Project. We are grateful to Daniel Selsam, Zhenghao\nChen, and Nike Sun, as well as to the anonymous reviewers, for a great deal of helpful feedback.\n\n9\n\n\fReferences\nP. Awasthi, M. F. Balcan, and P. M. Long. The power of localization for ef\ufb01ciently learning linear\n\nseparators with noise. In Symposium on Theory of Computing (STOC), pages 449\u2013458, 2014.\n\nJ. F. Bard. Practical Bilevel Optimization: Algorithms and Applications. Springer, 1999.\nM. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security of machine learning. Machine\n\nLearning, 81(2):121\u2013148, 2010.\n\nV. Behzadan and A. Munir. Vulnerability of deep reinforcement learning to policy induction attacks.\n\narXiv, 2017.\n\nK. Bhatia, P. Jain, and P. Kar. Robust regression via hard thresholding. In Advances in Neural\n\nInformation Processing Systems (NIPS), pages 721\u2013729, 2015.\n\nB. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support vector machines.\n\nInternational Conference on Machine Learning (ICML), pages 1467\u20131474, 2012.\n\nIn\n\nB. Biggio, I. Pillai, S. R. Bul\u00f2, D. Ariu, M. Pelillo, and F. Roli. Is data clustering in adversarial\n\nsettings secure? In Workshop on Arti\ufb01cial Intelligence and Security (AISec), 2013.\n\nB. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classi\ufb01ers under attack. IEEE\n\nTransactions on Knowledge and Data Engineering, 26(4):984\u2013996, 2014a.\n\nB. Biggio, K. Rieck, D. Ariu, C. Wressnegger, I. Corona, G. Giacinto, and F. Roli. Poisoning\nbehavioral malware clustering. In Workshop on Arti\ufb01cial Intelligence and Security (AISec), 2014b.\nB. Biggio, B. S. Rota, P. Ignazio, M. Michele, M. E. Zemene, P. Marcello, and R. Fabio. Poisoning\ncomplete-linkage hierarchical clustering. In Workshop on Structural, Syntactic, and Statistical\nPattern Recognition, 2014c.\n\nM. A. Bishop. The art and science of computer security. Addison-Wesley Longman Publishing Co.,\n\nInc., 2002.\n\nM. Br\u00fcckner and T. Scheffer. Stackelberg games for adversarial prediction problems. In SIGKDD,\n\npages 547\u2013555, 2011.\n\nM. Br\u00fcckner, C. Kanzow, and T. Scheffer. Static prediction games for adversarial learning problems.\n\nJournal of Machine Learning Research (JMLR), 13:2617\u20132654, 2012.\n\nC. Burkard and B. Lagesse. Analysis of causative attacks against SVMs learning from data streams.\n\nIn International Workshop on Security And Privacy Analytics, 2017.\n\nN. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, and W. Zhou. Hidden\n\nvoice commands. In USENIX Security, 2016.\n\nM. Charikar, J. Steinhardt, and G. Valiant. Learning from untrusted data. In Symposium on Theory of\n\nComputing (STOC), 2017.\n\nY. Chen, C. Caramanis, and S. Mannor. Robust high dimensional sparse regression and matching\n\npursuit. arXiv, 2013.\n\nG. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo, and A. D. Keromytis. Casting out demons:\nSanitizing training data for anomaly sensors. In IEEE Symposium on Security and Privacy, pages\n81\u201395, 2008.\n\nI. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high\ndimensions without the computational intractability. In Foundations of Computer Science (FOCS),\n2016.\n\nS. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization.\n\nJournal of Machine Learning Research (JMLR), 17(83):1\u20135, 2016.\n\nJ. Gardiner and S. Nagaraja. On the security of machine learning in malware c&c detection: A survey.\n\nACM Computing Surveys (CSUR), 49(3), 2016.\n\n10\n\n\fI. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and\nY. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems\n(NIPS), 2014.\n\nI. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In\n\nInternational Conference on Learning Representations (ICLR), 2015.\n\nGurobi Optimization, Inc. Gurobi optimizer reference manual, 2016.\n\nS. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network\n\npolicies. arXiv, 2017.\n\nS. M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds,\nmargin bounds, and regularization. In Advances in Neural Information Processing Systems (NIPS),\n2009.\n\nA. Kerckhoffs. La cryptographie militaire. Journal des sciences militaires, 9, 1883.\n\nA. R. Klivans, P. M. Long, and R. A. Servedio. Learning halfspaces with malicious noise. Journal of\n\nMachine Learning Research (JMLR), 10:2715\u20132740, 2009.\n\nP. W. Koh and P. Liang. Understanding black-box predictions via in\ufb02uence functions. In International\n\nConference on Machine Learning (ICML), 2017.\n\nA. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv, 2016.\n\nK. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In Foundations\n\nof Computer Science (FOCS), 2016.\n\nR. Laishram and V. V. Phoha. Curie: A method for protecting SVM classi\ufb01er from poisoning attack.\n\narXiv, 2016.\n\nA. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traf\ufb01c anomalies.\n\nSIGCOMM Computer Communication Review, volume 34, pages 219\u2013230, 2004.\n\nIn ACM\n\nP. Laskov and N. \u0160rndi`c. Practical evasion of a learning-based classi\ufb01er: A case study. In Symposium\n\non Security and Privacy, 2014.\n\nB. Li, Y. Wang, A. Singh, and Y. Vorobeychik. Data poisoning attacks on factorization-based\n\ncollaborative \ufb01ltering. In Advances in Neural Information Processing Systems (NIPS), 2016.\n\nY. Lin, Z. Hong, Y. Liao, M. Shih, M. Liu, and M. Sun. Tactics of adversarial attack on deep\n\nreinforcement learning agents. arXiv, 2017.\n\nJ. Liu and X. Zhu. The teaching dimension of linear learners. Journal of Machine Learning Research\n\n(JMLR), 17(162), 2016.\n\nJ. L\u00f6fberg. YALMIP: A toolbox for modeling and optimization in MATLAB. In CACSD, 2004.\n\nA. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for\n\nsentiment analysis. In Association for Computational Linguistics (ACL), 2011.\n\nS. Mei and X. Zhu. The security of latent Dirichlet allocation. In Arti\ufb01cial Intelligence and Statistics\n\n(AISTATS), 2015a.\n\nS. Mei and X. Zhu. Using machine teaching to identify optimal training-set attacks on machine\n\nlearners. In Association for the Advancement of Arti\ufb01cial Intelligence (AAAI), 2015b.\n\nV. Metsis, I. Androutsopoulos, and G. Paliouras. Spam \ufb01ltering with naive Bayes \u2013 which naive\n\nBayes? In CEAS, volume 17, pages 28\u201369, 2006.\n\nM. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha. Systematic poisoning attacks\non and defenses for machine learning in healthcare. IEEE Journal of Biomedical and Health\nInformatics, 19(6):1893\u20131905, 2015.\n\n11\n\n\fN. M. Nasrabadi, T. D. Tran, and N. Nguyen. Robust lasso with missing and grossly corrupted\n\nobservations. In Advances in Neural Information Processing Systems (NIPS), 2011.\n\nA. Newell, R. Potharaju, L. Xiang, and C. Nita-Rotaru. On the practicality of integrity attacks on\ndocument-level sentiment analysis. In Workshop on Arti\ufb01cial Intelligence and Security (AISec),\npages 83\u201393, 2014.\n\nJ. Newsome, B. Karp, and D. Song. Paragraph: Thwarting signature learning by training maliciously.\n\nIn International Workshop on Recent Advances in Intrusion Detection, 2006.\n\nN. H. Nguyen and T. D. Tran. Exact recoverability from dense corrupted observations via `1-\n\nminimization. IEEE Transactions on Information Theory, 59(4):2017\u20132035, 2013.\n\nN. Papernot, P. McDaniel, and I. Goodfellow. Transferability in machine learning: from phenomena\n\nto black-box attacks using adversarial samples. arXiv, 2016a.\n\nN. Papernot, P. McDaniel, A. Sinha, and M. Wellman. Towards the science of security and privacy in\n\nmachine learning. arXiv, 2016b.\n\nS. Park, J. Weimer, and I. Lee. Resilient linear classi\ufb01cation: an approach to deal with attacks on\n\ntraining data. In International Conference on Cyber-Physical Systems, pages 155\u2013164, 2017.\n\nB. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S. Lau, S. Rao, N. Taft, and J. Tygar. Antidote:\nIn ACM SIGCOMM\n\nUnderstanding and defending against poisoning of anomaly detectors.\nConference on Internet measurement conference, 2009.\n\nS. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in\n\nMachine Learning, 4(2):107\u2013194, 2011.\n\nJ. Steinhardt, S. Wager, and P. Liang. The statistics of streaming sparse regression. arXiv preprint\n\narXiv:1412.4182, 2014.\n\nJ. Steinhardt, G. Valiant, and M. Charikar. Avoiding imposters and delinquents: Adversarial crowd-\nsourcing and peer prediction. In Advances in Neural Information Processing Systems (NIPS),\n2016.\n\nJ. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones.\n\nOptimization Methods and Software, 11:625\u2013653, 1999.\n\nC. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing\nproperties of neural networks. In International Conference on Learning Representations (ICLR),\n2014.\n\nF. Tram\u00e8r, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Stealing machine learning models via\n\nprediction APIs. In USENIX Security, 2016.\n\nJ. Vuurens, A. P. de Vries, and C. Eickhoff. How much spam can you take? An analysis of crowd-\nsourcing results to increase accuracy. ACM SIGIR Workshop on Crowdsourcing for Information\nRetrieval, 2011.\n\nG. Wang. Combating Attacks and Abuse in Large Online Communities. PhD thesis, University of\n\nCalifornia Santa Barbara, 2016.\n\nH. Xiao, H. Xiao, and C. Eckert. Adversarial label \ufb02ips attack on support vector machines. In\n\nEuropean Conference on Arti\ufb01cial Intelligence, 2012.\n\nH. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli. Is feature selection secure against\n\ntraining data poisoning? In International Conference on Machine Learning (ICML), 2015a.\n\nH. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli. Support vector machines under\n\nadversarial label contamination. Neurocomputing, 160:53\u201362, 2015b.\n\nL. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal\n\nof Machine Learning Research (JMLR), 11:2543\u20132596, 2010.\n\n12\n\n\fC. Yang, Q. Wu, H. Li, and Y. Chen. Generative poisoning attack method against neural networks.\n\narXiv, 2017.\n\nY. Zhou and M. Kantarcioglu. Modeling adversarial learning as nested Stackelberg games. In\n\nPaci\ufb01c-Asia Conference on Knowledge Discovery and Data Mining, 2016.\n\n13\n\n\f", "award": [], "sourceid": 1988, "authors": [{"given_name": "Jacob", "family_name": "Steinhardt", "institution": "Stanford University"}, {"given_name": "Pang Wei", "family_name": "Koh", "institution": "Stanford University"}, {"given_name": "Percy", "family_name": "Liang", "institution": "Stanford University"}]}