{"title": "Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation", "book": "Advances in Neural Information Processing Systems", "page_first": 2266, "page_last": 2276, "abstract": "Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific \\emph{lower bounds} on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp. neural networks improves the robustness of the classifier without any loss in prediction performance.", "full_text": "Formal Guarantees on the Robustness of a\nClassi\ufb01er against Adversarial Manipulation\n\nMatthias Hein and Maksym Andriushchenko\nDepartment of Mathematics and Computer Science\n\nSaarland University, Saarbr\u00fccken Informatics Campus, Germany\n\nAbstract\n\nRecent work has shown that state-of-the-art classi\ufb01ers are quite brittle,\nin the sense that a small adversarial change of an originally with high\ncon\ufb01dence correctly classi\ufb01ed input leads to a wrong classi\ufb01cation again\nwith high con\ufb01dence. This raises concerns that such classi\ufb01ers are vulnerable\nto attacks and calls into question their usage in safety-critical systems. We\nshow in this paper for the \ufb01rst time formal guarantees on the robustness\nof a classi\ufb01er by giving instance-speci\ufb01c lower bounds on the norm of the\ninput manipulation required to change the classi\ufb01er decision. Based on\nthis analysis we propose the Cross-Lipschitz regularization functional. We\nshow that using this form of regularization in kernel methods resp. neural\nnetworks improves the robustness of the classi\ufb01er with no or small loss in\nprediction performance.\n\nIntroduction\n\n1\nThe problem of adversarial manipulation of classi\ufb01ers has been addressed initially in the area\nof spam email detection, see e.g. [5, 16]. The goal of the spammer is to manipulate the spam\nemail (the input of the classi\ufb01er) in such a way that it is not detected by the classi\ufb01er. In deep\nlearning the problem was brought up in the seminal paper by [24]. They showed for state-of-\nthe-art deep neural networks, that one can manipulate an originally correctly classi\ufb01ed input\nimage with a non-perceivable small transformation so that the classi\ufb01er now misclassi\ufb01es\nthis image with high con\ufb01dence, see [7] or Figure 3 for an illustration. This property calls\ninto question the usage of neural networks and other classi\ufb01ers showing this behavior in\nsafety critical systems, as they are vulnerable to attacks. On the other hand this also shows\nthat the concepts learned by a classi\ufb01er are still quite far away from the visual perception\nof humans. Subsequent research has found fast ways to generate adversarial samples with\nhigh probability [7, 12, 19] and suggested to use them during training as a form of data\naugmentation to gain more robustness. However, it turns out that the so-called adversarial\ntraining does not settle the problem as one can yet again construct adversarial examples\nfor the \ufb01nal classi\ufb01er. Interestingly, it has recently been shown that there exist universal\nadversarial changes which when applied lead, for every image, to a wrong classi\ufb01cation with\nhigh probability [17]. While one needs access to the neural network model for the generation\nof adversarial changes, it has been shown that adversarial manipulations generalize across\nneural networks [18, 15, 14], which means that neural network classi\ufb01ers can be attacked\neven as a black-box method. The most extreme case has been shown recently [15], where\nthey attack the commercial system Clarifai, which is a black-box system as neither the\nunderlying classi\ufb01er nor the training data are known. Nevertheless, they could successfully\ngenerate adversarial images with an existing network and fool this commercial system. This\nemphasizes that there are indeed severe security issues with modern neural networks. While\ncountermeasures have been proposed [8, 7, 26, 18, 12, 2], none of them provides a guarantee\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fof preventing this behavior [3]. One might think that generative adversarial neural networks\nshould be resistant to this problem, but it has recently been shown [13] that they can also\nbe attacked by adversarial manipulation of input images.\nIn this paper we show for the \ufb01rst time instance-speci\ufb01c formal guarantees on the robustness\nof a classi\ufb01er against adversarial manipulation. That means we provide lower bounds on the\nnorm of the change of the input required to alter the classi\ufb01er decision or said otherwise: we\nprovide a guarantee that the classi\ufb01er decision does not change in a certain ball around the\nconsidered instance. We exemplify our technique for two widely used family of classi\ufb01ers:\nkernel methods and neural networks. Based on the analysis we propose a new regularization\nfunctional, which we call Cross-Lipschitz Regularization. This regularization functional\ncan be used in kernel methods and neural networks. We show that using Cross-Lipschitz\nregularization improves both the formal guarantees of the resulting classi\ufb01er (lower bounds)\nas well as the change required for adversarial manipulation (upper bounds) while maintaining\nsimilar prediction performance achievable with other forms of regularization. While there\nexist fast ways to generate adversarial samples [7, 12, 19] without constraints, we provide\nalgorithms based on the \ufb01rst order approximation of the classi\ufb01er which generate adversarial\nsamples satisfying box constraints in O(d log d), where d is the input dimension.\n\n2 Formal Robustness Guarantees for Classi\ufb01ers\nIn the following we consider the multi-class setting for K classes and d features where one\nhas a classi\ufb01er f : Rd \u2192 RK and a point x is classi\ufb01ed via c = arg max\nfj(x). We call a\nj=1,...,K\nclassi\ufb01er robust at x if small changes of the input do not alter the decision. Formally, the\nproblem can be described as follows [24]. Suppose that the classi\ufb01er outputs class c for\ninput x, that is fc(x) > fj(x) for j 6= c (we assume the decision is unique). The problem of\ngenerating an input x + \u03b4 such that the classi\ufb01er decision changes, can be formulated as\n\nfl(x + \u03b4) \u2265 fc(x + \u03b4) and x + \u03b4 \u2208 C,\n\n(1)\n\nmin\n\u03b4\u2208Rd\n\nk\u03b4kp ,\n\ns.th. max\nl6=c\n\nwhere C is a constraint set specifying certain requirements on the generated input x + \u03b4, e.g.,\nan image has to be in [0, 1]d. Typically, the optimization problem (1) is non-convex and thus\nintractable. The so generated points x + \u03b4 are called adversarial samples. Depending on\nthe p-norm the perturbations have di\ufb00erent characteristics: for p = \u221e the perturbations are\nsmall and a\ufb00ect all features, whereas for p = 1 one gets sparse solutions up to the extreme\ncase that only a single feature is changed. In [24] they used p = 2 which leads to more spread\nbut still localized perturbations. The striking result of [24, 7] was that for most instances\nin computer vision datasets, the change \u03b4 necessary to alter the decision is astonishingly\nsmall and thus clearly the label should not change. However, we will see later that our new\nregularizer leads to robust classi\ufb01ers in the sense that the required adversarial change is so\nlarge that now also the class label changes (we have found the correct decision boundary),\nsee Fig 3. Already in [24] it is suggested to add the generated adversarial samples as a form\nof data augmentation during the training of neural networks in order to achieve robustness.\nThis is denoted as adversarial training. Later on fast ways to approximately solve (1) were\nproposed in order to speed up the adversarial training process [7, 12, 19]. However, in this\nfj(x + \u03b4) 6= c, one gets just\nway, given that the approximation is successful, that is arg max\nupper bounds on the perturbation necessary to change the classi\ufb01er decision. Also it was\nnoted early on, that the \ufb01nal classi\ufb01er achieved by adversarial training is again vulnerable\nto adversarial samples [7]. Robust optimization has been suggested as a measure against\nadversarial manipulation [12, 21] which e\ufb00ectively boils down to adversarial training in\npractice. It is thus fair to say that up to date no mechanism exists which prevents the\ngeneration of adversarial samples nor can defend against it [3].\nIn this paper we focus instead on robustness guarantees, that is we show that the classi\ufb01er\ndecision does not change in a small ball around the instance. Thus our guarantees hold for\nany method to generate adversarial samples or input transformations due to noise or sensor\nfailure etc. Such formal guarantees are in our point of view absolutely necessary when a\nclassi\ufb01er becomes part of a safety-critical technical system such as autonomous driving. In\nthe following we will \ufb01rst show how one can achieve such a guarantee and then explicitly\n\nj\n\n2\n\n\fderive bounds for kernel methods and neural networks. We think that such formal guarantees\non robustness should be investigated further and it should become standard to report them\nfor di\ufb00erent classi\ufb01ers alongside the usual performance measures.\n\n2.1 Formal Robustness Guarantee against Adversarial Manipulation\nThe following guarantee holds for any classi\ufb01er which is continuously di\ufb00erentiable with\nrespect to the input in each output component. It is instance-speci\ufb01c and depends to some\nextent on the con\ufb01dence in the decision, at least if we measure con\ufb01dence by the relative\ndi\ufb00erence fc(x)\u2212 maxj6=c fj(x) as it is typical for the cross-entropy loss and other multi-class\nlosses. In the following we use the notation Bp(x, R) = {y \u2208 Rd | kx \u2212 ykp \u2264 R}.\nTheorem 2.1. Let x \u2208 Rd and f : Rd \u2192 RK be a multi-class classi\ufb01er with continuously\ndi\ufb00erentiable components and let c = arg max\nfj(x) be the class which f predicts for x. Let\nj=1,...,K\nq \u2208 R be de\ufb01ned as 1\n\nq = 1, then for all \u03b4 \u2208 Rd with\n\np + 1\n\n\uf8fc\uf8fd\uf8fe := \u03b1,\n\n\uf8f1\uf8f2\uf8f3min\n\nj6=c\n\nk\u03b4kp \u2264 max\nR>0\n\nmin\n\nfc(x) \u2212 fj(x)\nk\u2207fc(y) \u2212 \u2207fj(y)kq\n\nmax\n\ny\u2208Bp(x,R)\n\n, R\n\nit holds c = arg max\nj=1,...,K\n\nfj(x + \u03b4), that is the classi\ufb01er decision does not change on Bp(x, \u03b1).\n\nNote that the bound requires in the denominator a bound on the local Lipschitz constant\nof all cross terms fc \u2212 fj, which we call local cross-Lipschitz constant in the following.\nHowever, we do not require to have a global bound. The problem with a global bound is\nthat the ideal robust classi\ufb01er is basically piecewise constant on larger regions with sharp\ntransitions between the classes. However, the global Lipschitz constant would then just be\nin\ufb02uenced by the sharp transition zones and would not yield a good bound, whereas the\nlocal bound can adapt to regions where the classi\ufb01er is approximately constant and then\nyields good guarantees. In [24, 4] they suggest to study the global Lipschitz constant1 of\neach fj, j = 1, . . . , K. A small global Lipschitz constant for all fj implies a good bound as\n(2)\nbut the converse does not hold. As discussed below it turns out that our local estimates are\nsigni\ufb01cantly better than the suggested global estimates which implies also better robustness\nguarantees. In turn we want to emphasize that our bound is tight, that is the bound is\nattained, for linear classi\ufb01ers fj(x) = hwj, xi, j = 1, . . . , K. It holds\n\nk\u2207fj(y) \u2212 \u2207fc(y)kq \u2264 k\u2207fj(y)kq + k\u2207fc(y)kq ,\n\nk\u03b4kp = min\nj6=c\n\nhwc \u2212 wj, xi\nkwc \u2212 wjkq\n\n.\n\nIn Section 4 we re\ufb01ne this result for the case when the input is constrained to [0, 1]d. In\ngeneral, it is possible to integrate constraints on the input by simply doing the maximum\nover the intersection of Bp(x, R) with the constraint set e.g. [0, 1]d for gray-scale images.\n\n2.2 Evaluation of the Bound for Kernel Methods\nNext, we discuss how the bound can be evaluated for di\ufb00erent classi\ufb01er models. For simplicity\nwe restrict ourselves to the case p = 2 (which implies q = 2) and leave the other cases to\nfuture work. We consider the class of kernel methods, that is the classi\ufb01er has the form\n\nnX\n\nfj(x) =\n\n\u03b1jrk(xr, x),\n\nr=1\n\nr=1 are the n training points, k : Rd \u00d7Rd \u2192 R is a positive de\ufb01nite kernel function\nwhere (xr)n\nand \u03b1 \u2208 RK\u00d7n are the trained parameters e.g. of a SVM. The goal is to upper bound the\n1The Lipschitz constant L wrt to p-norm of a piecewise continuously di\ufb00erentiable function is\n\ngiven as L = supx\u2208Rd k\u2207f(x)kq. Then it holds, |f(x) \u2212 f(y)| \u2264 Lkx \u2212 ykp.\n\n3\n\n\fterm maxy\u2208B2(x,R) k\u2207fj(y) \u2212 \u2207fc(y)k2 for this classi\ufb01er model. A simple calculation shows\n\n0 \u2264 k\u2207fj(y) \u2212 \u2207fc(y)k2\n\n2 =\n\n(\u03b1jr \u2212 \u03b1cr)(\u03b1js \u2212 \u03b1cs)h\u2207yk(xr, y),\u2207yk(xs, y)i\n\n(3)\n\nnX\n\nr,s=1\n\nIt has been reported that kernel methods with a Gaussian kernel are robust to noise. Thus\nwe specialize now to this class, that is k(x, y) = e\u2212\u03b3kx\u2212yk2\n\n2. In this case\n\no\n\n2\n\n, R\n\nh\u2207yk(xr, y),\u2207yk(xs, y)i = 4\u03b32 hy \u2212 xr, y \u2212 xsi e\u2212\u03b3kxr\u2212yk2\n\n2 e\u2212\u03b3kxs\u2212yk2\n2 .\n\nWe derive the following bound\n\nand S = k2x \u2212 xr \u2212 xsk2. Then\n\nmaxy\u2208B2(x,R) k\u2207fj(y) \u2212 \u2207fc(y)k2 \u2264 2\u03b3\n\nProposition 2.1. Let \u03b2r = \u03b1jr \u2212 \u03b1cr, r = 1, . . . , n and de\ufb01ne M = minnk2x\u2212xr\u2212xsk2\n nX\n2\u22122M S+2M 2(cid:1)\n2+2RS+2R2(cid:1)i\n2+2RS+2R2(cid:1)\n2\u22122M S+2M 2(cid:1)i! 1\n\nh max{hx \u2212 xr, x \u2212 xsi + RS + R2, 0}e\nh max{hx \u2212 xr, x \u2212 xsi \u2212 M S + M 2, 0}e\n\n\u2212\u03b3(cid:0)kx\u2212xrk2\n\u2212\u03b3(cid:0)kx\u2212xrk2\n\u2212\u03b3(cid:0)kx\u2212xrk2\n\u2212\u03b3(cid:0)kx\u2212xrk2\n\n+ min{hx \u2212 xr, x \u2212 xsi + RS + R2, 0}e\n\n+ min{hx \u2212 xr, x \u2212 xsi \u2212 M S + M 2, 0}e\n\n2+kx\u2212xsk2\n\n2+kx\u2212xsk2\n\n+\n\n\u03b2r\u03b2s\n\n2+kx\u2212xsk2\n\nr,s=1\n\u03b2r\u03b2s\u22650\n\nnX\n\nr,s=1\n\u03b2r\u03b2s<0\n\n2+kx\u2212xsk2\n\n\u03b2r\u03b2s\n\n2\n\nWhile the bound leads to non-trivial estimates as seen in Section 5, the bound is not very\ntight. The reason is that the sum is bounded elementwise, which is quite pessimistic. We\nthink that better bounds are possible but have to postpone this to future work.\n\n2.3 Evaluation of the Bound for Neural Networks\nWe derive the bound for a neural network with one hidden layer. In principle, the technique\nwe apply below can be used for arbitrary layers but the computational complexity increases\nrapidly. The problem is that in the directed network topology one has to consider almost\neach path separately to derive the bound. Let U be the number of hidden units and w, u\nare the weight matrices of the output resp.\ninput layer. We assume that the activation\nfunction \u03c3 is continuously di\ufb00erentiable and assume that the derivative \u03c30 is monotonically\nincreasing. Our prototype activation function we have in mind and which we use later\non in the experiment is the di\ufb00erentiable approximation, \u03c3\u03b1(x) = 1\n\u03b1 log(1 + e\u03b1x) of the\nReLU activation function \u03c3ReLU(x) = max{0, x}. Note that lim\u03b1\u2192\u221e \u03c3\u03b1(x) = \u03c3ReLU(x) and\n\u03c30\n\u03b1(x) =\n1+e\u2212\u03b1x . The output of the neural network can be written as\n\n1\n\nUX\n\n(cid:16) dX\n\nr=1\n\ns=1\n\n(cid:17)\n\nfj(x) =\n\nwjr \u03c3\n\nursxs\n\n,\n\nj = 1, . . . , K,\n\nUX\n\ndX\n\nwhere for simplicity we omit any bias terms, but it is straightforward to consider also models\nwith bias. A direct computation shows that\n\nk\u2207fj(y) \u2212 \u2207fc(y)k2\n\n2 =\n\n(wjr \u2212 wcr)(wjm \u2212 wcm)\u03c30(hur, yi)\u03c30(hum, yi)\n\nurluml, (4)\n\nr,m=1\n\nl=1\n\nwhere ur \u2208 Rd is the r-th row of the weight matrix u \u2208 RU\u00d7d. The resulting bound is given\nin the following proposition.\n\n4\n\n\fProposition 2.2. Let \u03c3 be a continuously di\ufb00erentiable activation function with \u03c30 mono-\n\nmaxy\u2208B2(x,R) k\u2207fj(y) \u2212 \u2207fc(y)k2\n\ntonically increasing. De\ufb01ne \u03b2rm = (wjr \u2212 wcr)(wjm \u2212 wcm)Pd\nmax{\u03b2rm, 0}\u03c30(cid:0)hur, xi + R kurk2\n+ min{\u03b2rm, 0}\u03c30(cid:0)hur, xi \u2212 R kurk2\n\n\u2264h UX\n\nr,m=1\n\nl=1 urluml. Then\n\n(cid:1)\u03c30(cid:0)hum, xi + R kumk2\n(cid:1)\u03c30(cid:0)hum, xi \u2212 R kumk2\n\n(cid:1)\n(cid:1)i 1\n\n2\n\nAs discussed above the global Lipschitz bounds of the individual classi\ufb01er outputs, see (2),\nlead to an upper bound of our desired local cross-Lipschitz constant. In the experiments\nbelow our local bounds on the Lipschitz constant are up to 8 times smaller, than what one\nwould achieve via the global Lipschitz bounds of [24]. This shows that their global approach\nis much too rough to get meaningful robustness guarantees.\n\n3 The Cross-Lipschitz Regularization Functional\nWe have seen in Section 2 that if\nmax\nj6=c\n\nk\u2207fc(y) \u2212 \u2207fj(y)kq ,\n\n(5)\nis small and fc(x) \u2212 fj(x) is large, then we get good robustness guarantees. The latter\nproperty is typically already optimized in a multi-class loss function. We consider for all\nmethods in this paper the cross-entropy loss so that the di\ufb00erences in the results only come\nfrom the chosen function class (kernel methods versus neural networks) and the chosen\nregularization functional. The cross-entropy loss L : {1, . . . , K} \u00d7 RK \u2192 R is given as\n\ny\u2208Bp(x,R)\n\nmax\n\nL(y, f(x)) = \u2212 log(cid:16)\n\nPK\n\nefy(x)\nk=1 efk(x)\n\n(cid:17) = log(cid:16)1 +\n\nKX\n\nk6=y\n\nefk(x)\u2212fy(x)(cid:17)\n\n.\n\nIn the latter formulation it becomes apparent that the loss tries to make the di\ufb00erence\nfy(x) \u2212 fk(x) as large as possible for all k = 1, . . . , K.\nAs our goal are good robustness guarantees it is natural to consider a proxy of the quantity\nin (5) for regularization. We de\ufb01ne the Cross-Lipschitz Regularization functional as\n\n\u2126(f) = 1\nnK2\n\nnX\n\nKX\n\ni=1\n\nl,m=1\n\nk\u2207fl(xi) \u2212 \u2207fm(xi)k2\n2 ,\n\n(6)\n\ni=1 are the training points. The goal of this regularization functional is to\nwhere the (xi)n\nmake the di\ufb00erences of the classi\ufb01er functions at the data points as constant as possible. In\ntotal by minimizing\n\nL(cid:0)yi, f(xi)(cid:1) + \u03bb\u2126(f),\n\nnX\n\ni=1\n\n1\nn\n\n(7)\n\nover some function class we thus try to maximize fc(xi) \u2212 fj(xi) and at the same time\nkeep k\u2207fl(xi) \u2212 \u2207fm(xi)k2\n2 small uniformly over all classes. This automatically enforces\nrobustness of the resulting classi\ufb01er. It is important to note that this regularization functional\nis coherent with the loss as it shares the same degrees of freedom, that is adding the same\nfunction g to all outputs: f0\nj(x) = fj(x) + g(x) leaves loss and regularization functional\ninvariant. This is the main di\ufb00erence to [4], where they enforce the global Lipschitz constant\nto be smaller than one.\n\n3.1 Cross-Lipschitz Regularization in Kernel Methods\nIn kernel methods one uses typically the regularization functional induced by the kernel which\ni=1 \u03b1ik(xi, x), in the corresponding\n\nis given as the squared norm of the function, f(x) =Pn\n\n5\n\n\f= Pn\n\nreproducing kernel Hilbert space Hk, kfk2\ni,j=1 \u03b1i\u03b1jk(xi, xj). In particular, for\ntranslation invariant kernels one can make directly a connection to penalization of derivatives\nof the function f via the Fourier transform, see [20]. However, penalizing higher-order\nderivatives is irrelevant for achieving robustness. Given the kernel expansion of f, one can\nwrite the Cross-Lipschitz regularization function as\n\nHk\n\n(\u03b1lr \u2212 \u03b1mr)(\u03b1ls \u2212 \u03b1ms)h\u2207yk(xr, xi),\u2207yk(xs, xi)i\n\n\u2126(f) = 1\nnK2\n\nnX\n\nKX\n\nnX\n\ni,j=1\n\nl,m=1\n\nr,s=1\n\n\u2126 is convex in \u03b1 \u2208 RK\u00d7n as k0(xr, xs) = h\u2207yk(xr, xi),\u2207yk(xs, xi)i is a positive de\ufb01nite\nkernel for any xi and with the convex cross-entropy loss the learning problem in (7) is convex.\n\n3.2 Cross-Lipschitz Regularization in Neural Networks\nThe standard way to regularize neural networks is weight decay; that is, the squared\nEuclidean norm of all weights is added to the objective. More recently dropout [22], which\ncan be seen as a form of stochastic regularization, has been introduced. Dropout can\nalso be interpreted as a form of regularization of the weights [22, 10].\nIt is interesting\nto note that classical regularization functionals which penalize derivatives of the resulting\nclassi\ufb01er function are not typically used in deep learning, but see [6, 11]. As noted above\nwe restrict ourselves to one hidden layer neural networks to simplify notation, that is,\nj = 1, . . . , K. Then we can write the Cross-Lipschitz\n\nfj(x) =PU\nregularization as\n\u2126(f) = 2\nnK2\nwhich leads to an expression which can be fast evaluated using vectorization. Obviously, one\ncan also implement the Cross-Lipschitz Regularization also for all standard deep networks.\n\nr=1 wjr \u03c3(cid:0)Pd\n(cid:16) KX\nUX\n\n(cid:1),\nwlrwls \u2212 KX\n\n\u03c30(hur, xii)\u03c30(hus, xii)\n\n(cid:17) nX\n\nKX\n\ndX\n\ns=1 ursxs\n\nl=1\n\nm=1\n\nwlr\n\nwms\n\nr,s=1\n\nl=1\n\ni,j=1\n\nurlusl\n\nl=1\n\n4 Box Constrained Adversarial Sample Generation\nThe main emphasis of this paper are robustness guarantees without resorting to particular\nways how to generate adversarial samples. On the other hand while Theorem 2.1 gives\nlower bounds on the required input transformation, e\ufb03cient ways to approximately solve\nthe adversarial sample generation in (1) are helpful to get upper bounds on the required\nchange. Upper bounds allow us to check how tight our derived lower bounds are. As all of\nour experiments will be concerned with images, it is reasonable that our adversarial samples\nare also images. However, up to our knowledge, the current main techniques to generate\nadversarial samples [7, 12, 19] integrate box constraints by clipping the results to [0, 1]d. We\nprovide in the following fast algorithms to generate adversarial samples which lie in [0, 1]d.\nThe strategy is similar to [12], where they use a linear approximation of the classi\ufb01er to\nderive adversarial samples with respect to di\ufb00erent norms. Formally,\nj = 1, . . . , K.\n\nfj(x + \u03b4) \u2248 fj(x) + h\u2207fj(x), \u03b4i ,\n\nAssuming that the linear approximation holds, the optimization problem (1) integrating box\nconstraints for changing class c into j becomes\n\nmin\u03b4\u2208Rd k\u03b4kp\nsbj. to: fj(x) \u2212 fc(x) \u2265 h\u2207fc(x) \u2212 \u2207fj(x), \u03b4i\n\n0 \u2264 xj + \u03b4j \u2264 1\n\n(8)\n\nIn order to get the minimal adversarial sample we have to solve this for all j 6= c and take\nthe one with minimal k\u03b4kp. This yields the minimal adversarial change for linear classiifers.\nNote that (8) is a convex optimization problem, which can be reduced to a one-parameter\nproblem in the dual. This allows to derive the following result (proofs and algorithms are in\nthe supplement).\nProposition 4.1. Let p \u2208 {1, 2,\u221e}, then (8) can be solved in O(d log d) time.\nFor nonlinear classi\ufb01ers a change of the decision is not guaranteed and thus we use later on\na binary search with a variable c instead of fc(x) \u2212 fj(x).\n\n6\n\n\f5 Experiments\nThe goal of the experiments is the evaluation of the robustness of the resulting classi\ufb01ers\nand not necessarily state-of-the-art results in terms of test error. In all cases we compute the\nrobustness guarantees from Theorem 2.1 (lower bound on the norm of the minimal change\nrequired to change the classi\ufb01er decision), where we optimize over R using binary search,\nand adversarial samples with the algorithm for the 2-norm from Section 4 (upper bound\non the norm of the minimal change required to change the classi\ufb01er decision), where we do\na binary search in the classi\ufb01er output di\ufb00erence in order to \ufb01nd a point on the decision\nboundary. Additional experiments can be found in the supplementary material.\n\n\u03c12\nKNN40\n\nKernel methods: We optimize the cross-entropy loss once with the standard regularization\n(Kernel-LogReg) and with Cross-Lipschitz regularization (Kernel-CL). Both are convex\noptimization problems and we use L-BFGS to solve them. We use the Gaussian kernel\nk(x, y) = e\u2212\u03b3kx\u2212yk2 where \u03b3 = \u03b1\nand \u03c1KNN40 is the mean of the 40 nearest neighbor\ndistances on the training set and \u03b1 \u2208 {0.5, 1, 2, 4}. We show the results for MNIST (60000\ntraining and 10000 test samples). However, we have checked that parameter selection\nusing a subset of 50000 images from the training set and evaluating on the rest yields\nindeed the parameters which give the best test errors when trained on the full set. The\nregularization parameter is chosen in \u03bb \u2208 {10\u2212k|k \u2208 {5, 6, 7, 8}} for Kernel-SVM and\n\u03bb \u2208 {10\u2212k | k \u2208 {0, 1, 2, 3}} for our Kernel-CL. The results of the optimal parameters are\ngiven in the following table and the performance of all parameters is shown in Figure 1. Note\nthat due to the high computational complexity we could evaluate the robustness guarantees\nonly for the optimal parameters.\n\navg. k\u00b7k2\nadv.\nsamples\n\ntest\nerror\n2.23% 2.39\n\nNo Reg.\n(\u03bb = 0)\nK-SVM 1.48% 1.91\nK-CL\n1.44% 3.12\n\navg.k\u00b7k2\nrob.\nguar.\n0.037\n\n0.058\n0.045\n\nFigure 1: Kernel Methods: Cross-Lipschitz regularization achieves both better test error and robustness against\nadversarial samples (upper bounds, larger is better) compared to the standard regularization. The robustness\nguarantee is weaker than for neural networks but this is most likely due to the relatively loose bound.\n\nNeural Networks: Before we demonstrate how upper and lower bounds improve using\ncross-Lipschitz regularization, we \ufb01rst want to highlight the importance of the usage of the\nlocal cross-Lipschitz constant in Theorem 2.1 for our robustness guarantee.\n\nLocal versus global Cross-Lipschitz constant: While no robustness guarantee has\nbeen proven before, it has been discussed in [24] that penalization of the global Lipschitz\nconstant should improve robustness, see also [4]. For that purpose they derive the Lipschitz\nconstants of several di\ufb00erent layers and use the fact that the Lipschitz constant of a\ncomposition of functions is upper bounded by the product of the Lipschitz constants of\nthe functions. In analogy, this would mean that the term supy\u2208B(x,R) k\u2207fc(y) \u2212 \u2207fj(y)k2,\nwhich we have upper bounded in Proposition 2.2, in the denominator in Theorem 2.1 could\nbe replaced2 by the global Lipschitz constant of g(x) := fc(x) \u2212 fj(x). which is given as\nsupy\u2208Rd k\u2207g(x)k2 = supx6=y\n. We have with kUk2,2 being the largest singular value\nof U,\n\n|g(x)\u2212g(y)|\nkx\u2212yk2\n\n|g(x) \u2212 g(y)| = hwc \u2212 wj, \u03c3(U x) \u2212 \u03c3(U y)i \u2264 kwc \u2212 wjk2 k\u03c3(U x) \u2212 \u03c3(U y)k2\n\n\u2264 kwc \u2212 wjk2 kU(x \u2212 y)k2 \u2264 kwc \u2212 wjk2 kUk2,2 kx \u2212 yk2 ,\n\nwhere we used that \u03c3 is contractive as \u03c30(z) =\n\n1+e\u2212\u03b1z and thus we get\nk\u2207fc(x) \u2212 \u2207fj(x)k2 \u2264 kwc \u2212 wjk2 kUk2,2 .\n\n1\n\nsup\ny\u2208Rd\n\n2Note that then the optimization of R in Theorem 2.1 would be unnecessary.\n\n7\n\n\fMNIST (plain)\n\nCIFAR10 (plain)\n\n0.48\n\nNone Dropout Weight Dec. Cross Lip. None Dropout Weight Dec. Cross Lip.\n0.69\n\n0.21\nTable 1: We show the average ratio \u03b1global\nof the robustness guarantees \u03b1global, \u03b1local from Theorem 2.1 on\n\u03b1local\nthe test data for MNIST and CIFAR10 and di\ufb00erent regularizers. The guarantees using the local Cross-Lipschitz\nconstant are up to eight times better than with the global one.\n\n0.68\n\n0.22\n\n0.13\n\n0.24\n\n0.17\n\nThe advantage is clearly that this global Cross-Lipschitz constant can just be computed\nonce and by using it in Theorem 2.1 one can evaluate the guarantees very quickly. However,\nit turns out that one gets signi\ufb01cantly better robustness guarantees by using the local\nCross-Lipschitz constant in terms of the bound derived in Proposition 2.2 instead of the just\nderived global Lipschitz constant. Note that the optimization over R in Theorem 2.1 is done\nusing a binary search, noting that the bound of the local Lipschitz constant in Proposition\n2.2 is monotonically decreasing in R. We have the following comparison in Table 1. We\nwant to highlight that the robustness guarantee with the global Cross-Lipschitz constant\nwas always worse than when using the local Cross-Lipschitz constant across all regularizers\nand data sets. Table 1 shows that the guarantees using the local Cross-Lipschitz can be up\nto eight times better than for the global one. As these are just one hidden layer networks, it\nis obvious that robustness guarantees for deep neural networks based on the global Lipschitz\nconstants will be too coarse to be useful.\n\nExperiments: We use a one hidden layer network with 1024 hidden units and the softplus\nactivation function with \u03b1 = 10. Thus the resulting classi\ufb01er is continuously di\ufb00erentiable.\nWe compare three di\ufb00erent regularization techniques: weight decay, dropout and our Cross-\nLipschitz regularization. Training is done with SGD. For each method we have adapted\nthe learning rate (two per method) and regularization parameters (4 per method) so that\nall methods achieve good performance. We do experiments for MNIST and CIFAR10\nin three settings: plain, data augmentation and adversarial training. The exact settings\nof the parameters and the augmentation techniques are described in the supplementary\nmaterial.The results for MNIST are shown in Figure 2 and the results for CIFAR10 are\nin the supplementary material.For MNIST there is a clear trend that our Cross-Lipschitz\nregularization improves the robustness of the resulting classi\ufb01er while having competitive\nresp. better test error.\nIt is surprising that data augmentation does not lead to more\nrobust models. However, adversarial training improves the guarantees as well as adversarial\nresistance. For CIFAR10 the picture is mixed, our CL-Regularization performs well for\nthe augmented task in test error and upper bounds but is not signi\ufb01cantly better in the\nrobustness guarantees. The problem might be that the overall bad performance due to the\nsimple model is preventing a better behavior. Data augmentation leads to better test error\nbut the robustness properties (upper and lower bounds) are basically unchanged. Adversarial\ntraining slightly improves performance compared to the plain setting and improves upper\nand lower bounds in terms of robustness. We want to highlight that our guarantees (lower\nbounds) and the upper bounds from the adversarial samples are not too far away.\n\nIllustration of adversarial samples: we take one test image from MNIST and apply the\nadversarial generation from Section 4 wrt to the 2-norm to generate the adversarial samples for\nthe di\ufb00erent kernel methods and neural networks (plain setting), where we use for each method\nthe parameters leading to best test performance. All classi\ufb01ers change their originally correct\ndecision to a \u201cwrong\u201d one. It is interesting to note that for Cross-Lipschitz regularization\n(both kernel method and neural network) the \u201cadversarial\u201d sample is really at the decision\nboundary between 1 and 8 (as predicted) and thus the new decision is actually correct.\nThis e\ufb00ect is strongest for our Kernel-CL, which also requires the strongest modi\ufb01cation to\ngenerate the adversarial sample. The situation is di\ufb00erent for neural networks, where the\nclassi\ufb01ers obtained from the two standard regularization techniques are still vulnerable, as\nthe adversarial sample is still clearly a 1 for dropout and weight decay.\n\nOutlook Formal guarantees on machine learning systems are becoming increasingly more\nimportant as they are used in safety-critical systems. We think that there should be more\n\n8\n\n\fAdversarial Resistance (Upper Bound)\n\nwrt to L2-norm\n\nRobustness Guarantee (Lower Bound)\n\nwrt to L2-norm\n\nFigure 2: Neural Networks, Left: Adversarial resistance wrt to L2-norm on MNIST. Right: Average ro-\nbustness guarantee wrt to L2-norm on MNIST for di\ufb00erent neural networks (one hidden layer, 1024 HU) and\nhyperparameters. The Cross-Lipschitz regularization leads to better robustness with similar or better prediction\nperformance. Top row: plain MNIST, Middle: Data Augmentation, Bottom: Adv. Training\n\nresearch on robustness guarantees (lower bounds), whereas current research is focused on\nnew attacks (upper bounds). We have argued that our instance-speci\ufb01c guarantees using our\nlocal Cross-Lipschitz constant is more e\ufb00ective than using a global one and leads to lower\nbounds which are up to 8 times better. A major open problem is to come up with tight\nlower bounds for deep networks.\n\nOriginal, Class 1\n\nK-SVM, Pred:7, k\u03b4k2 = 1.2\n\nK-CL, Pred:8, k\u03b4k2 = 3.5\n\nNN-WD, Pred:8, k\u03b4k2 = 1.2 NN-DO, Pred:7, k\u03b4k2 = 1.1 NN-CL, Pred:8, k\u03b4k2 = 2.6\nFigure 3: Top left: original test image, for each classi\ufb01er we generate the corresponding adversarial sample which\nchanges the classi\ufb01er decision (denoted as Pred). Note that for Cross-Lipschitz regularization this new decision\nmakes (often) sense, whereas for the neural network models (weight decay/dropout) the change is so small that\nthe new decision is clearly wrong.\n\n9\n\n\fReferences\n[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,\nJ. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia,\nR. J\u00f3zefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man\u00e9, R. Monga, S. Moore, D. G.\nMurray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker,\nV. Vanhoucke, V. Vasudevan, F. B. Vi\u00e9gas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,\nY. Yu, and X. Zheng. Tensor\ufb02ow: Large-scale machine learning on heterogeneous distributed\nsystems, 2016.\n\n[2] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi. Measuring\n\nneural net robustness with constraints. In NIPS, 2016.\n\n[3] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten\n\ndetection methods. In ACM Workshop on Arti\ufb01cial Intelligence and Security, 2017.\n\n[4] M. Cisse, P. Bojanowksi, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving\n\nrobustness to adversarial examples. In ICML, 2017.\n\n[5] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classi\ufb01cation. In KDD,\n\n2004.\n\n[6] H. Drucker and Y. Le Cun. Double backpropagation increasing generalization performance. In\n\nIJCNN, 1992.\n\n[7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.\n\nIn ICLR, 2015.\n\n[8] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples.\n\nIn ICLR Workshop, 2015.\n\n[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR,\n\npages 770\u2013778, 2016.\n\n[10] D. P. Helmbold and P. Long. On the inductive bias of dropout. Journal of Machine Learning\n\nResearch, 16:3403\u20133454, 2015.\n\n[11] S. Hochreiter and J. Schmidhuber. Simplifying neural nets by discovering \ufb02at minima. In NIPS,\n\n1995.\n\n[12] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learning with a strong adversary. In\n\nICLR, 2016.\n\n[13] J. Kos, I. Fischer, and D. Song. Adversarial examples for generative models. In ICLR Workshop,\n\n2017.\n\n[14] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In\n\nICLR Workshop, 2017.\n\n[15] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and\n\nblack-box attacks. In ICLR, 2017.\n\n[16] D. Lowd and C. Meek. Adversarial learning. In KDD, 2005.\n[17] S.M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations.\n\nIn CVPR, 2017.\n\n[18] N. Papernot, P. McDonald, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial\n\nperturbations against deep networks. In IEEE Symposium on Security & Privacy, 2016.\n\n[19] P. Frossard S.-M. Moosavi-Dezfooli, A. Fawzi. Deepfool: a simple and accurate method to fool\n\ndeep neural networks. In CVPR, pages 2574\u20132582, 2016.\n\n[20] B. Sch\u00f6lkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.\n[21] U. Shaham, Y. Yamada, and S. Negahban. Understanding adversarial training: Increasing local\n\nstability of neural nets through robust optimization. In NIPS, 2016.\n\n[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A\nsimple way to prevent neural networks from over\ufb01tting. Journal of Machine Learning Research,\n15:1929\u20131958, 2014.\n\n10\n\n\f[23] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs. computer: Benchmarking machine\n\nlearning algorithms for tra\ufb03c sign recognition. Neural Networks, 32:323\u2013332, 2012.\n\n[24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus.\n\nIntriguing properties of neural networks. In ICLR, pages 2503\u20132511, 2014.\n\n[25] S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, pages 87.1\u201387.12.\n[26] S. Zheng, Y. Song, T. Leung, and I. J. Goodfellow. Improving the robustness of deep neural\n\nnetworks via stability training. In CVPR, 2016.\n\n11\n\n\f", "award": [], "sourceid": 1331, "authors": [{"given_name": "Matthias", "family_name": "Hein", "institution": "Saarland University"}, {"given_name": "Maksym", "family_name": "Andriushchenko", "institution": "Saarland University"}]}