{"title": "Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers", "book": "Advances in Neural Information Processing Systems", "page_first": 11292, "page_last": 11303, "abstract": "Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to $\\ell_2$-norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably $\\ell_2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\\ell_2$-defenses. Moreover, we find that pre-training and semi-supervised learning boost adversarially trained smoothed classifiers even further. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial.", "full_text": "Provably Robust Deep Learning via Adversarially\n\nTrained Smoothed Classi\ufb01ers\n\nPengchuan Zhang\u2217, Huan Zhang\u2217, Ilya Razenshteyn\u2217, S\u00e9bastien Bubeck\u2217\n\nHadi Salman\u2020, Greg Yang\u00a7, Jerry Li,\n\nMicrosoft Research AI\n\n{hadi.salman, gregyang, jerrl,\n\npenzhan, t-huzhan, ilyaraz, sebubeck }@microsoft.com\n\nAbstract\n\nRecent works have shown the effectiveness of randomized smoothing as a scalable\ntechnique for building neural network-based classi\ufb01ers that are provably robust to\n(cid:96)2-norm adversarial perturbations. In this paper, we employ adversarial training\nto improve the performance of randomized smoothing. We design an adapted\nattack for smoothed classi\ufb01ers, and we show how this attack can be used in an\nadversarial training setting to boost the provable robustness of smoothed classi\ufb01ers.\nWe demonstrate through extensive experimentation that our method consistently\noutperforms all existing provably (cid:96)2-robust classi\ufb01ers by a signi\ufb01cant margin on\nImageNet and CIFAR-10, establishing the state-of-the-art for provable (cid:96)2-defenses.\nMoreover, we \ufb01nd that pre-training and semi-supervised learning boost adversar-\nially trained smoothed classi\ufb01ers even further. Our code and trained models are\navailable at http://github.com/Hadisalman/smoothing-adversarial2.\n\nIntroduction\n\n1\nNeural networks have been very successful in tasks such as image classi\ufb01cation and speech recogni-\ntion, but have been shown to be extremely brittle to small, adversarially-chosen perturbations of their\ninputs [33, 14]. A classi\ufb01er (e.g., a neural network), which correctly classi\ufb01es an image x, can be\nfooled by an adversary to misclassify x + \u03b4 where \u03b4 is an adversarial perturbation so small that x\nand x + \u03b4 are indistinguishable for the human eye. Recently, many works have proposed heuristic\ndefenses intended to train models robust to such adversarial perturbations. However, most of these\ndefenses were broken using more powerful adversaries [4, 2, 35]. This encouraged researchers to\ndevelop defenses that lead to certi\ufb01ably robust classi\ufb01ers, i.e., whose predictions for most of the test\nexamples x can be veri\ufb01ed to be constant within a neighborhood of x [39, 27]. Unfortunately, these\ntechniques do not immediately scale to large neural networks that are used in practice.\nTo mitigate this limitation of prior certi\ufb01able defenses, a number of papers [21, 22, 6] consider the\nrandomized smoothing approach, which transforms any classi\ufb01er f (e.g., a neural network) into a\nnew smoothed classi\ufb01er g that has certi\ufb01able (cid:96)2-norm robustness guarantees. This transformation\nworks as follows.\nLet f be an arbitrary base classi\ufb01er which maps inputs in Rd to classes in Y. Given an input x, the\nsmoothed classi\ufb01er g(x) labels x as having class c which is the most likely to be returned by the base\nclassi\ufb01er f when fed a noisy corruption x + \u03b4, where \u03b4 \u223c N (x, \u03c32I) is a vector sampled according\nto an isotropic Gaussian distribution.\nAs shown in [6], one can derive certi\ufb01able robustness for such smoothed classi\ufb01ers via the Neyman-\nPearson lemma. They demonstrate that for (cid:96)2 perturbations, randomized smoothing outperforms\n\u2217Reverse alphabetical order. \u2020Work done as part of the Microsoft AI Residency Program. \u00a7Primary mentor.\n2Please see http://arxiv.org/abs/1906.04584 for the full and most recent version of this paper.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fTable 1: Certi\ufb01ed top-1 accuracy of our best ImageNet classi\ufb01ers at various (cid:96)2 radii.\n\n(cid:96)2 RADIUS (IMAGENET)\nCOHEN ET AL. [6] (%)\nOURS (%)\n\n0.5\n49\n56\n\n1.0\n37\n45\n\n1.5\n29\n38\n\n2.0\n19\n28\n\n2.5\n15\n26\n\n3.0\n12\n20\n\n3.5\n9\n17\n\nTable 2: Certi\ufb01ed top-1 accuracy of our best CIFAR-10 classi\ufb01ers at various (cid:96)2 radii.\n\n(cid:96)2 RADIUS (CIFAR-10)\nCOHEN ET AL. [6] (%)\nOURS (%)\n+ PRE-TRAINING (%)\n+ SEMI-SUPERVISION (%)\n+ BOTH(%)\n\n0.25\n61\n73\n80\n80\n81\n\n0.5\n43\n58\n62\n63\n63\n\n0.75\n32\n48\n52\n52\n52\n\n1.0\n22\n38\n38\n40\n37\n\n1.25\n17\n33\n34\n34\n33\n\n1.5\n13\n29\n30\n29\n29\n\n1.75\n10\n24\n25\n25\n25\n\n2.0\n7\n18\n19\n19\n18\n\n2.25\n\n4\n16\n16\n17\n16\n\nother certi\ufb01ably robust classi\ufb01ers that have been previously proposed. It is scalable to networks with\nany architecture and size, which makes it suitable for building robust real-world neural networks.\n\nOur contributions\nIn this paper, we employ adversarial training to substantially improve on the\nprevious certi\ufb01ed robustness results3 of randomized smoothing [21, 22, 6]. We present, for the\n\ufb01rst time, a direct attack for smoothed classi\ufb01ers. We then demonstrate how to use this attack to\nadversarially train smoothed models with not only boosted empirical robustness but also substantially\nimproved certi\ufb01able robustness using the certi\ufb01cation method of Cohen et al. [6].\nWe demonstrate that our method outperforms all existing provably (cid:96)2-robust classi\ufb01ers by a signi\ufb01cant\nmargin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable (cid:96)2-defenses. For\ninstance, our Resnet-50 ImageNet classi\ufb01er achieves 56% provable top-1 accuracy (compared to\nthe best previous provable accuracy of 49%) under adversarial perturbations with (cid:96)2 norm less than\n127/255. Similarly, our Resnet-110 CIFAR-10 smoothed classi\ufb01er achieves up to 16% improvement\nover previous state-of-the-art, and by combining our technique with pre-training [17] and semi-\nsupervised learning [5], we boost our results to up to 22% improvement over previous state-of-the-art.\nOur main results are reported in Tables 1 and 2 for ImageNet and CIFAR-10. See Tables 16 and 17 in\nAppendix G for the standard accuracies corresponding to these results.\nFinally, we provide an alternative, but more concise, proof of the tight robustness guarantee of Cohen\net al. [6] by casting this as a nonlinear Lipschitz property of the smoothed classi\ufb01er. See appendix A\nfor the complete proof.\n\n2 Our techniques\n\nHere we describe our techniques for adversarial attacks and training on smoothed classi\ufb01ers. We \ufb01rst\nrequire some background on randomized smoothing classi\ufb01ers. For a more detailed description of\nrandomized smoothing, see Cohen et al. [6].\n\n2.1 Background on randomized smoothing\nConsider a classi\ufb01er f from Rd to classes Y. Randomized smoothing is a method that constructs a\nnew, smoothed classi\ufb01er g from the base classi\ufb01er f. The smoothed classi\ufb01er g assigns to a query\npoint x the class which is most likely to be returned by the base classi\ufb01er f under isotropic Gaussian\nnoise perturbation of x, i.e.,\n\ng(x) = arg max\n\nc\u2208Y\n\nP(f (x + \u03b4) = c) where \u03b4 \u223c N (0, \u03c32I) .\n\n(1)\n\nThe noise level \u03c32 is a hyperparameter of the smoothed classi\ufb01er g which controls a robust-\nness/accuracy tradeoff. Equivalently, this means that g(x) returns the class c whose decision region\n{x(cid:48) \u2208 Rd : f (x(cid:48)) = c} has the largest measure under the distribution N (x, \u03c32I). Cohen et al. [6]\n3Note that we do not provide a new certi\ufb01cation method incorporating adversarial training; the improvements\n\nthat we get are due to the higher quality of our base classi\ufb01ers as a result of adversarial training.\n\n2\n\n\frecently presented a tight robustness guarantee for the smoothed classi\ufb01er g and gave Monte Carlo\nalgorithms for certifying the robustness of g around x or predicting the class of x using g, that succeed\nwith high probability.\n\nRobustness guarantee for smoothed classi\ufb01ers The robustness guarantee presented by [6] uses\nthe Neyman-Pearson lemma, and is as follows: suppose that when the base classi\ufb01er f classi\ufb01es\nN (x, \u03c32I), the class cA is returned with probability pA = P(f (x + \u03b4) = cA), and the \u201crunner-up\u201d\nP(f (x + \u03b4) = c). The smoothed classi\ufb01er g is\nclass cB is returned with probability pB = maxc(cid:54)=cA\nrobust around x within the radius\n\nR =\n\n(2)\nwhere \u03a6\u22121 is the inverse of the standard Gaussian CDF. It is not clear how to compute pA and pB\nexactly (if f is given by a deep neural network for example). Monte Carlo sampling is used to\nestimate some pA and pB for which pA \u2264 pA and pB \u2265 pB with arbitrarily high probability over the\nsamples. The result of (2) still holds if we replace pA with pA and pB with pB.\nThis guarantee can in fact be obtained alternatively by explicitly computing the Lipschitz constant of\nthe smoothed classi\ufb01er, as we do in Appendix A.\n\n(cid:0)\u03a6\u22121(pA) \u2212 \u03a6\u22121(pB)(cid:1) ,\n\n\u03c3\n2\n\n2.2 SMOOTHADV: Attacking smoothed classi\ufb01ers\nWe now describe our attack against smoothed classi\ufb01ers. To do so, it will \ufb01rst be useful to describe\nsmoothed classi\ufb01ers in a more general setting. Speci\ufb01cally, we consider a generalization of (1) to soft\nclassi\ufb01ers, namely, functions F : Rd \u2192 P (Y), where P (Y) is the set of probability distributions over\nY. Neural networks typically learn such soft classi\ufb01ers, then use the argmax of the soft classi\ufb01er as the\n\ufb01nal hard classi\ufb01er. Given a soft classi\ufb01er F , its associated smoothed soft classi\ufb01er G : Rn \u2192 P (Y)\nis de\ufb01ned as\n\nG(x) =(cid:0)F \u2217 N (0, \u03c32I)(cid:1) (x) =\n\n[F (x + \u03b4)] .\n\nE\n\n(3)\n\n\u03b4\u223cN (0,\u03c32I)\n\nLet f (x) and F (x) denote the hard and soft classi\ufb01ers learned by the neural network, respectively,\nand let g and G denote the associated smoothed hard and smoothed soft classi\ufb01ers. Directly \ufb01nding\nadversarial examples for the smoothed hard classi\ufb01er g is a somewhat ill-behaved problem because\nof the argmax, so we instead propose to \ufb01nd adversarial examples for the smoothed soft classi\ufb01er\nG. Empirically we found that doing so will also \ufb01nd good adversarial examples for the smoothed\nhard classi\ufb01er. More concretely, given a labeled data point (x, y), we wish to \ufb01nd a point \u02c6x which\nmaximizes the loss of G in an (cid:96)2 ball around x for some choice of loss function. As is canonical in\nthe literature, we focus on the cross entropy loss (cid:96)CE. Thus, given a labeled data point (x, y) our\n(ideal) adversarial perturbation is given by the formula:\n\n(cid:18)\n\n\u02c6x = arg max\n(cid:107)x(cid:48)\u2212x(cid:107)2\u2264\u0001\n\n= arg max\n(cid:107)x(cid:48)\u2212x(cid:107)2\u2264\u0001\n\n(cid:96)CE(G(x(cid:48)), y)\n\n\u2212 log\n\nE\n\n\u03b4\u223cN (0,\u03c32I)\n\n(cid:104)\n(F (x(cid:48) + \u03b4))y\n\n(cid:105)(cid:19)\n\n.\n\n(S)\n\nWe will refer to (S) as the SMOOTHADV objective. The SMOOTHADV objective is highly non-convex,\nso as is common in the literature, we will optimize it via projected gradient descent (PGD), and\nvariants thereof. It is hard to \ufb01nd exact gradients for (S), so in practice we must use some estimator\nbased on random Gaussian samples. There are a number of different natural estimators for the\nderivative of the objective function in (S), and the choice of estimator can dramatically change the\nperformance of the attack. For more details, see Section 3.\nWe note that (S) should not be confused with the similar-looking objective\n[(cid:96)CE(F (x(cid:48) + \u03b4), y)]\n\nE\n\n(cid:18)\n(cid:18)\n\n(cid:19)\n(cid:105)(cid:19)\n(cid:104)\u2212 log (F (x(cid:48) + \u03b4))y\n\n\u02c6xwrong = arg max\n(cid:107)x(cid:48)\u2212x(cid:107)2\u2264\u0001\n\n= arg max\n(cid:107)x(cid:48)\u2212x(cid:107)2\u2264\u0001\n\n\u03b4\u223cN (0,\u03c32I)\n\nE\n\n\u03b4\u223cN (0,\u03c32I)\n\nas suggested in section G.3 of [6]. There is a subtle, but very important, distinction between (S) and\n(4). Conceptually, solving (4) corresponds to \ufb01nding an adversarial example of F that is robust to\n\n3\n\n,\n\n(4)\n\n\fGaussian noise. In contrast, (S) is directly attacking the smoothed model i.e. trying to \ufb01nd adversarial\nexamples that decrease the probability of correct classi\ufb01cation of the smoothed soft classi\ufb01er G.\nFrom this point of view, (S) is the right optimization problem that should be used to \ufb01nd adversarial\nexamples of G. This distinction turns out to be crucial in practice: empirically, Cohen et al. [6] found\nattacks based on (4) not to be effective.\nInterestingly, for a large class of classi\ufb01ers, including neural networks, one can alternatively derive the\nobjective (S) from an optimization perspective, by attempting to directly \ufb01nd adversarial examples to\nthe smoothed hard classi\ufb01er that the neural network provides. While they ultimately yield the same\nobjective, this perspective may also be enlightening, and so we include it in Appendix B.\n\n2.3 Adversarial training using SMOOTHADV\nWe now wish to use our new attack to boost the adversarial robustness of smoothed classi\ufb01ers. We do\nso using the well-studied adversarial training framework [20, 25]. In adversarial training, given a\ncurrent set of model weights wt and a labeled data point (xt, yt), one \ufb01nds an adversarial perturbation\n\u02c6xt of xt for the current model wt, and then takes a gradient step for the model parameters, evaluated\nat the point (\u02c6xt, yt). Intuitively, this encourages the network to learn to minimize the worst-case loss\nover a neighborhood around the input.\nAt a high level, we propose to instead do adversarial training using an adversarial example for the\nsmoothed classi\ufb01er. We combine this with the approach suggested in Cohen et al. [6], and train at\nGaussian perturbations of this adversarial example. That is, given current set of weights wt and\na labeled data point (xt, yt), we \ufb01nd \u02c6xt as a solution to (S), and then take a gradient step for wt\nbased at gaussian perturbations of \u02c6xt. In contrast to standard adversarial training, we are training the\nbase classi\ufb01er so that its associated smoothed classi\ufb01er minimizes worst-case loss in a neighborhood\naround the current point. For more details of our implementation, see Section 3.2. We emphasize that\nalthough we are training using adversarial examples for the smoothed soft classi\ufb01er, in the end we\ncertify the robustness of the smoothed hard classi\ufb01er we obtain after training.\nWe make two important observations about our method. First, adversarial training is an empirical\ndefense, and typically offers no provable guarantees. However, we demonstrate that by combining\nour formulation of adversarial training with randomized smoothing, we are able to substantially boost\nthe certi\ufb01able robust accuracy of our smoothed classi\ufb01ers. Thus, while adversarial training using\nSMOOTHADV is still ultimately a heuristic, and offers no provable robustness by itself, the smoothed\nclassi\ufb01er that we obtain using this heuristic has strong certi\ufb01able guarantees.\nSecond, we found empirically that to obtain strong certi\ufb01able numbers using randomized smoothing,\nit is insuf\ufb01cient to use standard adversarial training on the base classi\ufb01er. While such adversarial\ntraining does indeed offer good empirical robust accuracy, the resulting classi\ufb01er is not optimized for\nrandomized smoothing. In contrast, our method speci\ufb01cally \ufb01nds base classi\ufb01ers whose smoothed\ncounterparts are robust. As a result, the certi\ufb01able numbers for standard adversarial training are\nnoticeably worse than those obtained using our method. See Appendix C.1 for an in-depth comparison.\n\n3\n\nImplementing SMOOTHADV via \ufb01rst order methods\n\nAs mentioned above, it is dif\ufb01cult to optimize the SMOOTHADV objective, so we will approximate it\nvia \ufb01rst order methods. We focus on two such methods: the well-studied projected gradient descent\n(PGD) method [20, 25], and the recently proposed decoupled direction and norm (DDN) method [29]\nwhich achieves (cid:96)2 robust accuracy competitive with PGD on CIFAR-10.\nThe main task when implementing these methods is to, given a data point (x, y), compute the gradient\nof the objective function in (S) with respect to x(cid:48). If we let J(x(cid:48)) = (cid:96)CE(G(x(cid:48)), y) denote the\nobjective function in (S), we have\n\n\u2207x(cid:48)J(x(cid:48)) = \u2207x(cid:48)\n\n\u2212 log\n\nE\n\n\u03b4\u223cN (0,\u03c32I)\n\n[F (x(cid:48) + \u03b4)y]\n\n.\n\n(5)\n\nHowever, it is not clear how to evaluate (5) exactly, as it takes the form of a complicated high\ndimensional integral. Therefore, we will use Monte Carlo approximations. We sample i.i.d. Gaussians\n\u03b41, . . . , \u03b4m \u223c N (0, \u03c32I), and use the plug-in estimator for the expectation:\n\n\u2207x(cid:48)J(x(cid:48)) \u2248 \u2207x(cid:48)\n\n\u2212 log\n\nF (x(cid:48) + \u03b4i)y\n\n.\n\n(6)\n\n(cid:19)\n\n(cid:33)(cid:33)\n\n(cid:18)\n\n(cid:32)\n\nm(cid:88)\n\ni=1\n\n1\nm\n\n(cid:32)\n\n4\n\n\fPseudocode 1: SMOOTHADV-ersarial Training\n\nfunction TRAINMINIBATCH((x(1), y(1)), (x(2), y(2)), . . . , (x(B), y(B)))\ni \u223c N (0, \u03c32I) for 1 \u2264 i \u2264 m, 1 \u2264 j \u2264 B\n\nATTACKER \u2190 (SMOOTHADVPGD or SMOOTHADVDDN)\nGenerate noise samples \u03b4(j)\nL \u2190 []\nfor 1 \u2264 j \u2264 B do\n\n# List of adversarial examples for training\n\n\u02c6x(j) \u2190 x(j)\nfor 1 \u2264 k \u2264 T do\n\n# Adversarial example\n\nUpdate \u02c6x(j) according to the k-th step of ATTACKER, where we use\nthe noise samples \u03b4(j)\nmodel according to (6)\n# We are reusing the same noise samples between different steps of the attack\n\nm to estimate a gradient of the loss of the smoothed\n\n1 , \u03b4(j)\n\n2 , . . . , \u03b4(j)\n\nend\nAppend ((\u02c6x(j) + \u03b4(j)\n# Again, we are reusing the same noise samples for the augmentation\n\n2 , y(j)), . . . , (\u02c6x(j) + \u03b4(j)\n\n1 , y(j)), (\u02c6x(j) + \u03b4(j)\n\nm , y(j))) to L\n\nend\nRun backpropagation on L with an appropriate learning rate\n\nIt is not hard to see that if F is smooth, this estimator will converge to (5) as we take more samples. In\npractice, if we take m samples, then to evaluate (6) on all m samples requires evaluating the network\nm times. This becomes expensive for large m, especially if we want to plug this into the adversarial\ntraining framework, which is already slow. Thus, when we use this for adversarial training, we\nuse mtrain \u2208 {1, 2, 4, 8}. When we run this attack to evaluate the empirical adversarial accuracy\nof our models, we use substantially larger choices of m, speci\ufb01cally, mtest \u2208 {1, 4, 8, 16, 64, 128}.\nEmpirically we found that increasing m beyond 128 did not substantially improve performance.\nWhile this estimator does converge to the true gradient given enough samples, note that it is not\nan unbiased estimator for the gradient. Despite this, we found that using (6) performs very well in\npractice. Indeed, using (6) yields our strongest empirical attacks, as well as our strongest certi\ufb01able\ndefenses when we use this attack in adversarial training. In the remainder of the paper, we let\nSMOOTHADVPGD denote the PGD attack with gradient steps given by (6), and similarly we let\nSMOOTHADVDDN denote the DDN attack with gradient steps given by (6).\n\n3.1 An unbiased, gradient free method\nWe note that there is an alternative way to optimize (S) using \ufb01rst order methods. Notice that the\nlogarithm in (S) does not change the argmax, and so it suf\ufb01ces to \ufb01nd a minimizer of G(x(cid:48))y subject\nto the (cid:96)2 constraint. We then observe that\n\n\u2207x(cid:48)(G(x(cid:48))y) =\n\nE\n\n\u03b4\u223cN (0,\u03c32I)\n\n[\u2207x(cid:48)F (x(cid:48) + \u03b4)y]\n\n(a)\n=\n\nE\n\n\u03b4\u223cN (0,\u03c32I)\n\n.\n\n(7)\n\n(cid:21)\n\n(cid:20) \u03b4\n\u03c32 \u00b7 F (x(cid:48) + \u03b4)y\n(cid:80)m\n\nThe equality (a) is known as Stein\u2019s lemma [32], although we note that something similar can be\nderived for more general distributions. There is a natural unbiased estimator for (7): sample i.i.d.\ngaussians \u03b41, . . . , \u03b4m \u223c N (0, \u03c32I), and form the estimator \u2207x(cid:48)(G(x(cid:48))y) \u2248 1\n\u03c32 \u00b7 F (x(cid:48) +\n\u03b4i)y . This estimator has a number of nice properties. As mentioned previously, it is an unbiased\nestimator for (7), in contrast to (6). It also requires no computations of the gradient of F ; if F is a\nneural network, this saves both time and memory by not storing preactivations during the forward\npass. Finally, it is very general: the derivation of (7) actually holds even if F is a hard classi\ufb01er\n(or more precisely, the one-hot embedding of a hard classi\ufb01er). In particular, this implies that this\ntechnique can even be used to directly \ufb01nd adversarial examples of the smoothed hard classi\ufb01er.\nDespite these appealing features, in practice we \ufb01nd that this attack is quite weak. We speculate that\nthis is because the variance of the gradient estimator is too high. For this reason, in the empirical\nevaluation we focus on attacks using (6), but we believe that investigating this attack in practice is an\ninteresting direction for future work. See Appendix C.6 for more details.\n\ni=1\n\nm\n\n\u03b4i\n\nImplementing adversarial training for smoothed classi\ufb01ers\n\n3.2\nWe incorporate adversarial training into the approach of Cohen et al. [6] changing as few moving\nparts as possible in order to enable a direct comparison. In particular, we use the same network\narchitectures, batch size, and learning rate schedule. For CIFAR-10, we change the number of epochs,\n\n5\n\n\fFigure 1: Comparing our SMOOTHADV-ersarially trained CIFAR-10 classi\ufb01ers vs Cohen et al. [6].\n(Left) Upper envelopes of certi\ufb01ed accuracies over all experiments. (Middle) Upper envelopes of\ncerti\ufb01ed accuracies per \u03c3. (Right) Certi\ufb01ed accuracies of one representative model per \u03c3. Details of\neach model used to generate these plots and their certi\ufb01ed accuracies are in Tables 7-15 in Appendix G.\n\nbut for ImageNet, we leave it the same. We discuss more of these speci\ufb01cs in Appendix D, and here\nwe describe how to perform adversarial training on a single mini-batch. The algorithm is shown in\nPseudocode 1, with the following parameters: B is the mini-batch size, m is the number of noise\nsamples used for gradient estimation in (6) as well as for Gaussian noise data augmentation, and T is\nthe number of steps of an attack4.\n\n4 Experiments\nWe primarily compare with Cohen et al. [6] as it was shown to outperform all other scalable provable\n(cid:96)2 defenses by a wide margin. As our experiments will demonstrate, our method consistently and\nsigni\ufb01cantly outperforms Cohen et al. [6] even further, establishing the state-of-the-art for provable\n(cid:96)2-defenses. We run experiments on ImageNet [8] and CIFAR-10 [19]. We use the same base\nclassi\ufb01ers f as Cohen et al. [6]: a ResNet-50 [16] on ImageNet, and ResNet-110 on CIFAR-10.\nOther than the choice of attack (SMOOTHADVPGD or SMOOTHADVDDN) for adversarial training,\nour experiments are distinguished based on \ufb01ve main hyperparameters:\n\n\u0001 = maximum allowed (cid:96)2 perturbation of the input\nT = number of steps of the attack\n\u03c3 = std. of Gaussian noise data augmentation during training and certi\ufb01cation\n\nmtrain = number of noise samples used to estimate (6) during training\nmtest = number of noise samples used to estimate (6) during evaluation\n\n(\u2666)\n\nGiven a smoothed classi\ufb01er g, we use the same prediction and certi\ufb01cation algorithms, PREDICT and\nCERTIFY, as [6]. Both algorithms sample base classi\ufb01er predictions under Gaussian noise. PREDICT\noutputs the majority vote if the vote count passes a binomial hypothesis test, and abstains otherwise.\nCERTIFY certi\ufb01es the majority vote is robust if the fraction of such votes is higher by a calculated\nmargin than the fraction of the next most popular votes, and abstains otherwise. For details of these\nalgorithms, we refer the reader to [6].\nThe certi\ufb01ed accuracy at radius r is de\ufb01ned as the fraction of the test set which g classi\ufb01es correctly\n(without abstaining) and certi\ufb01es robust at an (cid:96)2 radius r. Unless otherwise speci\ufb01ed, we use the\nsame \u03c3 for certi\ufb01cation as the one used for training the base classi\ufb01er f. Note that g is a randomized\nsmoothing classi\ufb01er, so this reported accuracy is approximate, but can get arbitrarily close to the\ntrue certi\ufb01ed accuracy as the number of samples of g increases (see [6] for more details). Similarly,\nthe empirical accuracy is de\ufb01ned as the fraction of the (cid:96)2 SMOOTHADV-ersarially attacked test set\nwhich g classi\ufb01es correctly (without abstaining). Both PREDICT and CERTIFY have a parameter \u03b1\nde\ufb01ning the failure rate of these algorithms. Throughout the paper, we set \u03b1 = 0.001 (similar to [6]),\nwhich means there is at most a 0.1% chance that PREDICT does not return the most probable class\nunder the smoothed classi\ufb01er g, or that CERTIFY falsely certi\ufb01es a non-robust input.\n\n4.1 SMOOTHADV-ersarial training\nTo assess the effectiveness of our method, we learn a smoothed classi\ufb01er g that is adversarial trained\nusing (S). Then we compute the certi\ufb01ed accuracies5 over a range of (cid:96)2 radii r. Tables 1 and 2\n4Note that we are reusing the same noise samples during every step of our attack as well as during augmenta-\n\ntion. Intuitively, this helps to stabilize the attack process.\n\n5Similar to Cohen et al. [6], we certi\ufb01ed the full CIFAR-10 test set and a subsampled ImageNet test set of\n\n500 samples.\n\n6\n\n0.000.250.500.751.001.251.501.752.002.252 radius0.00.20.40.60.81.0AccuracyOurs certifiedCohen et al. certifiedOurs empiricalCohen et. al empirical0.00.51.01.52.02.53.03.54.02 radius0.00.20.40.60.81.0Certified AccuracyOurs|=0.12Ours|=0.25Ours|=0.50Ours|=1.00Cohen et al.|=0.12Cohen et al.|=0.25Cohen et al.|=0.50Cohen et al.|=1.000.00.51.01.52.02.53.03.54.02 radius0.00.20.40.60.81.0Certified AccuracyOurs|=0.12Ours|=0.25Ours|=0.50Ours|=1.00Cohen et al.|=0.12Cohen et al.|=0.25Cohen et al.|=0.50Cohen et al.|=1.00\fFigure 2: Comparing our SMOOTHADV-ersarially trained ImageNet classi\ufb01ers vs Cohen et al. [6].\nSub\ufb01gure captions are same as Fig. 1. Details of each model used to generate these plots and their\ncerti\ufb01ed accuracies are in Table 6 in Appendix G.\n\nTable 3: Certi\ufb01ed (cid:96)\u221e robustness at a radius of 2\net al. [5]\u2019s give accuracies with high probability (W.H.P).\n\n255 on CIFAR-10. Note that our models and Carmon\n\nMODEL\nOURS (%)\nCARMON ET AL. [5] (%)\nWONG AND KOLTER [39] (SINGLE) (%)\nWONG AND KOLTER [39] (ENSEMBLE) (%)\nIBP [15] (%)\n\n(cid:96)\u221e ACC. AT 2/255\n\nSTANDARD ACC.\n\n68.2 (W.H.P)\n\n63.8 \u00b1 0.5 (W.H.P)\n\n53.9\n63.6\n50.0\n\n86.2 (W.H.P)\n\n80.7 \u00b1 0.3 (W.H.P)\n\n68.3\n64.1\n70.2\n\nreport the certi\ufb01ed accuracies using our method compared to [6]. For all radii, we outperform the\ncerti\ufb01ed accuracies of [6] by a signi\ufb01cant margin on both ImageNet and CIFAR-10. These results are\nelaborated below.\n\nFor CIFAR-10 Fig. 1(left) plots the upper envelope of the certi\ufb01ed accuracies that we get by\nchoosing the best model for each radius over a grid of hyperparameters. This grid consists of\nmtrain \u2208 {1, 2, 4, 8}, \u03c3 \u2208 {0.12, 0.25, 0.5, 1.0}, \u0001 \u2208 {0.25, 0.5, 1.0, 2.0} (see \u2666 for explanation),\nand one of the following attacks {SMOOTHADVPGD, SMOOTHADVDDN} with T \u2208 {2, 4, 6, 8, 10}\nsteps. The certi\ufb01ed accuracies of each model can be found in Tables 7-15 in Appendix G. These results\nare compared to those of Cohen et al. [6] by plotting their reported certi\ufb01ed accuracies. Fig. 1(left)\nalso plots the corresponding empirical accuracies using SMOOTHADVPGD with mtest = 128. Note\nthat our certi\ufb01ed accuracies are higher than the empirical accuracies of Cohen et al. [6].\nFig. 1(middle) plots our vs [6]\u2019s best models for varying noise level \u03c3. Fig. 1(right) plots a represen-\ntative model for each \u03c3 from our adversarially trained models. Observe that we outperform [6] in all\nthree plots.\n\nFor ImageNet The results are summarized in Fig. 2, which is similar to Fig. 1 for CIFAR-10, with\nthe difference being the set of smoothed models we certify. This set includes smoothed models\ntrained using mtrain = 1, \u03c3 \u2208 {0.25, 0.5, 1.0}, \u0001 \u2208 {0.5, 1.0, 2.0, 4.0}, and one of the following\nattacks {1-step SMOOTHADVPGD, 2-step SMOOTHADVDDN}. Again, our models outperform those\nof Cohen et al. [6] overall and per \u03c3 as well. The certi\ufb01ed accuracies of each model can be found in\nTable 6 in Appendix G.\nWe point out, as mentioned by Cohen et al. [6], that \u03c3 controls a robustness/accuracy trade-off. When\n\u03c3 is low, small radii can be certi\ufb01ed with high accuracy, but large radii cannot be certi\ufb01ed at all.\nWhen \u03c3 is high, larger radii can be certi\ufb01ed, but smaller radii are certi\ufb01ed at a lower accuracy. This\ncan be observed in the middle and the right plots of Fig. 1 and 2.\n\nEffect on clean accuracy Training smoothed classifers using SMOOTHADV as shown improves\nupon the certi\ufb01ed accuracy of Cohen et al. [6] for each \u03c3, although this comes with the well-known\neffect of adversarial training in decreasing the standard accuracy, so we sometimes see small drops in\nthe accuracy at r = 0, as observed in Fig. 1(right) and 2(right).\n\nd contains the (cid:96)\u221e unit ball in Rd, a model\n\u221a\n(cid:96)2 to (cid:96)\u221e certi\ufb01ed defense Since the (cid:96)2 ball of radius\nrobust against (cid:96)2 perturbation of radius r is also robust against (cid:96)\u221e perturbation of norm r/\nd.\nVia this naive conversion, we \ufb01nd our (cid:96)2-robust models enjoy non-trivial (cid:96)\u221e certi\ufb01ed robustness.\n\n\u221a\n\n7\n\n0.00.51.01.52.02.53.03.52 radius0.00.20.40.60.81.0AccuracyOurs certifiedCohen et al. certified0.00.51.01.52.02.53.03.54.02 radius0.00.20.40.60.81.0Certified AccuracyOurs|=0.25Ours|=0.50Ours|=1.00Cohen et al.|=0.25Cohen et al.|=0.50Cohen et al.|=1.000.00.51.01.52.02.53.03.54.02 radius0.00.20.40.60.81.0Certified AccuracyOurs|=0.25Ours|=0.50Ours|=1.00Cohen et al.|=0.25Cohen et al.|=0.50Cohen et al.|=1.00\f\u221a\nIn Table 3, we report the best6 (cid:96)\u221e certi\ufb01ed accuracy that we get on CIFAR-10 at a radius of 2/255\n(implied by the (cid:96)2 certi\ufb01ed accuracy at a radius of 0.435 \u2248 2\n3 \u00d7 322/255). We exceed previous\nstate-of-the-art in certi\ufb01ed (cid:96)\u221e defenses by at least 3.9%. We obtain similar results for ImageNet\ncerti\ufb01ed (cid:96)\u221e defenses at a radius of 1/255 where we exceed the previous state-of-the-art by 8.2%;\ndetails are in appendix F.\n\nAdditional experiments and observations We compare the effectiveness of smoothed classi\ufb01ers\nwhen they are trained SMOOTHADV-versarially vs. when their base classi\ufb01er is trained via standard\nadversarial training (we will refer to the latter as vanilla adversarial training). As expected, because\nthe training objective of SMOOTHADV-models aligns with the actual certi\ufb01cation objective, those\nmodels achieve noticeably more certi\ufb01ed robustness over all radii compared to smoothed classi\ufb01ers\nresulting from vanilla adversarial training. We defer the results and details to Appendix C.1.\nFurthermore, SMOOTHADV requires the evaluation of (6) as discussed in Section 3. We analyze\nin Appendix C.2 how the number of Gaussian noise samples mtrain, used in (6) to \ufb01nd adversarial\nexamples, affects the robustness of the resulting smoothed models. As expected, we observe that\nmodels trained with higher mtrain tend to have higher certi\ufb01ed accuracies.\nFinally, we analyze the effect of the maximum allowed (cid:96)2 perturbation \u0001 used in SMOOTHADV on\nthe robustness of smoothed models in Appendix C.3. We observe that as \u0001 increases, the certi\ufb01ed\naccuracies for small (cid:96)2 radii decrease, but those for large (cid:96)2 radii increase, which is expected.\n\n4.2 More Data for Better Provable Robustness\n\nWe explore using more data to improve the robustness of smoothed classi\ufb01ers. Speci\ufb01cally, we pursue\ntwo ideas: 1) pre-training similar to [17], and 2) semi-supervised learning as in [5].\n\nPre-training Hendrycks et al. [17] recently showed that using pre-training can improve the adver-\nsarial robustness of classi\ufb01ers, and achieved state-of-the-art results for empirical l\u221e defenses on\nCIFAR-10 and CIFAR-100. We employ this within our framework; we pretrain smoothed classi\ufb01ers\non ImageNet, then \ufb01ne-tune them on CIFAR-10. Details can be found in Appendix E.1.\n\nSemi-supervised learning Carmon et al. [5] recently showed that using unlabelled data can im-\nprove the adversarial robustness as well. They employ a simple, yet effective, semi-supervised\nlearning technique called self-training to improve the robustness of CIFAR-10 classi\ufb01ers. We employ\nthis idea in our framework and we train our CIFAR-10 smoothed classi\ufb01ers via self-training using the\nunlabelled dataset used in Carmon et al. [5]. Details can be found in Appendix E.2.\nWe further experiment with combining semi-supervised learning and pre-training, and the details are\nin Appendix E.3. We observe consistent improvement in the certi\ufb01ed robustness of our smoothed\nmodels when we employ pre-training or semi-supervision. The results are summarized in Table 2.\n\n4.3 Attacking trained models with SMOOTHADV\nIn this section, we assess the performance of our attack, particularly SMOOTHADVPGD, for \ufb01nding\nadversarial examples for the CIFAR-10 randomized smoothing models of Cohen et al. [6].\nSMOOTHADVPGD requires the evaluation of (6) as discussed in Section 3. Here, we analyze how\nsensitive our attack is to the number of samples mtest used in (6) for estimating the gradient of the\nadversarial objective. Fig. 3 shows the empirical accuracies for various values of mtest. Lower\naccuracies corresponds to stronger attack. SMOOTHADV with mtest = 1 sample performs worse\nthan the vanilla PGD attack on the base classi\ufb01er, but as mtest increases, our attack becomes stronger,\ndecreasing the gap between certi\ufb01ed and empirical accuracies. We did not observe any noticeable\nimprovement beyond mtest = 128. More details are in Appendix C.4.\nWhile as discussed here, the success rate of the attack is affected by the number of Gaussian noise\nsamples mtest used by the attacker, it is also affected by the number of Gaussian noise samples n in\nPREDICT used by the classi\ufb01er. Indeed, as n increases, abstention due to low con\ufb01dence becomes\nmore rare, increasing the prediction quality of the smoothed classi\ufb01er. See a detailed analysis in\nAppendix C.5.\n\n6We report the model with the highest certi\ufb01ed (cid:96)2 accuracy on CIFAR-10 at a radius of 0.435, amongst all\n\nour models trained in this paper.\n\n8\n\n\fFigure 3: Certi\ufb01ed and empirical robust accuracy of Cohen et al. [6]\u2019s models on CIFAR-10. For\neach (cid:96)2 radius r, the certi\ufb01ed/empirical accuracy is the maximum over randomized smoothing\nmodels trained using \u03c3 \u2208 {0.12, 0.25, 0.5, 1.0}. The empirical accuracies are found using 20 steps\nof SMOOTHADVPGD. The closer an empirical curve is to the certi\ufb01ed curve, the stronger the\ncorresponding attack is (the lower the better).\n\n5 Related Work\nRecently, many approaches (defenses) have been proposed to build adversarially robust classi\ufb01ers,\nand these approaches can be broadly divided into empirical defenses and certi\ufb01ed defenses.\nEmpirical defenses are empirically robust to existing adversarial attacks, and the best empirical\ndefense so far is adversarial training [20, 25]. In this kind of defense, a neural network is trained to\nminimize the worst-case loss over a neighborhood around the input. Although such defenses seem\npowerful, nothing guarantees that a more powerful, not yet known, attack would not break them; the\nmost that can be said is that known attacks are unable to \ufb01nd adversarial examples around the data\npoints. In fact, most empirical defenses proposed in the literature were later \u201cbroken\u201d by stronger\nadversaries [4, 2, 35, 1]. To stop this arms race between defenders and attackers, a number of work\ntried to focus on building certi\ufb01ed defenses which enjoy formal robustness guarantees.\nCerti\ufb01ed defenses are provably robust to a speci\ufb01c class of adversarial perturbation, and can guaran-\ntee that for any input x, the classi\ufb01er\u2019s prediction is constant within a neighborhood of x. These are\ntypically based on certi\ufb01cation methods which are either exact (a.k.a \u201ccomplete\u201d) or conservative\n(a.k.a \u201csound but incomplete\u201d). Exact methods, usually based on Satis\ufb01ability Modulo Theories\nsolvers [18, 11] or mixed integer linear programming [34, 24, 12], are guaranteed to \ufb01nd an adversar-\nial example around a datapoint if it exists. Unfortunately, they are computationally inef\ufb01cient and\ndif\ufb01cult to scale up to large neural networks. Conservative methods are also guaranteed to detect an\nadversarial example if exists, but they might mistakenly \ufb02ag a safe data point as vulnerable to adversar-\nial examples. On the bright side, these methods are more scalable and ef\ufb01cient which makes some of\nthem useful for building certi\ufb01ed defenses [39, 36, 37, 27, 28, 40, 10, 9, 7, 30, 13, 26, 31, 15, 38, 41].\nHowever, none of them have yet been shown to scale to practical networks that are large and expres-\nsive enough to perform well on ImageNet, for example. To scale up to practical networks, randomized\nsmoothing has been proposed as a probabilistically certi\ufb01ed defense.\nRandomized smoothing A randomized smoothing classi\ufb01er is not itself a neural network, but\nuses a neural network as its base for classi\ufb01cation. Randomized smoothing was proposed by several\nworks [23, 3] as a heuristic defense without proving any guarantees. Lecuyer et al. [21] \ufb01rst proved\nrobustness guarantees for randomized smoothing classi\ufb01er, utilizing inequalities from the differential\nprivacy literature. Subsequently, Li et al. [22] gave a stronger robustness guarantee using tools from\ninformation theory. Recently, Cohen et al. [6] provided a tight robustness guarantee for randomized\nsmoothing and consequently achieved the state of the art in (cid:96)2-norm certi\ufb01ed defense.\n\n6 Conclusions\nIn this paper, we designed an adapted attack for smoothed classi\ufb01ers, and we showed how this attack\ncan be used in an adversarial training setting to substantially improve the provable robustness of\nsmoothed classi\ufb01ers. We demonstrated through extensive experimentation that our adversarially\ntrained smooth classi\ufb01ers consistently outperforms all existing provably (cid:96)2-robust classi\ufb01ers by\na signi\ufb01cant margin on ImageNet and CIFAR-10, establishing the state of the art for provable\n(cid:96)2-defenses.\n\n9\n\n0.00.51.01.52.02 radius0.000.250.500.751.00AccuracyCohen et al. certifiedEmpirical vanilla PGD Empirical, mtest=1Empirical, mtest=4Empirical, mtest=8Empirical, mtest=16Empirical, mtest=64Empirical, mtest=128\fAcknowledgements\n\nWe would like to thank Zico Kolter, Jeremy Cohen, Elan Rosenfeld, Aleksander Madry, Andrew\nIlyas, Dimitris Tsipras, Shibani Santurkar, and Jacob Steinhardt for comments and discussions.\n\nReferences\n[1] Anish Athalye and Nicholas Carlini. On the robustness of the cvpr 2018 white-box adversarial\n\nexample defenses. arXiv preprint arXiv:1804.03286, 2018.\n\n[2] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of\nsecurity: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420,\n2018.\n\n[3] Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via\nregion-based classi\ufb01cation. In Proceedings of the 33rd Annual Computer Security Applications\nConference, pages 278\u2013287. ACM, 2017.\n\n[4] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing\nten detection methods. In Proceedings of the 10th ACM Workshop on Arti\ufb01cial Intelligence and\nSecurity, pages 3\u201314. ACM, 2017.\n\n[5] Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled\n\ndata improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019.\n\n[6] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certi\ufb01ed adversarial robustness via\n\nrandomized smoothing. arXiv preprint arXiv:1902.02918, 2019.\n\n[7] Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu\n\nnetworks via maximization of linear regions. arXiv preprint arXiv:1810.07481, 2018.\n\n[8] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-\nscale hierarchical image database. In 2009 IEEE conference on computer vision and pattern\nrecognition, pages 248\u2013255. Ieee, 2009.\n\n[9] Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan\nO\u2019Donoghue, Jonathan Uesato, and Pushmeet Kohli. Training veri\ufb01ed learners with learned\nveri\ufb01ers. arXiv preprint arXiv:1805.10265, 2018.\n\n[10] Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli.\n\nA dual approach to scalable veri\ufb01cation of deep networks. UAI, 2018.\n\n[11] Ruediger Ehlers. Formal veri\ufb01cation of piece-wise linear feed-forward neural networks. In\nInternational Symposium on Automated Technology for Veri\ufb01cation and Analysis, pages 269\u2013\n286. Springer, 2017.\n\n[12] Matteo Fischetti and Jason Jo. Deep neural networks as 0-1 mixed integer linear programs: A\n\nfeasibility study. arXiv preprint arXiv:1712.06174, 2017.\n\n[13] Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri,\nand Martin Vechev. Ai2: Safety and robustness certi\ufb01cation of neural networks with abstract\ninterpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pages 3\u201318. IEEE, 2018.\n\n[14] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversar-\n\nial examples. ICLR, 2015.\n\n[15] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan\nUesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation\nfor training veri\ufb01ably robust models. arXiv preprint arXiv:1810.12715, 2018.\n\n[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image\nrecognition. In Proceedings of the IEEE conference on computer vision and pattern recognition,\npages 770\u2013778, 2016.\n\n10\n\n\f[17] Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model\n\nrobustness and uncertainty. arXiv preprint arXiv:1901.09960, 2019.\n\n[18] Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex:\nAn ef\ufb01cient smt solver for verifying deep neural networks. In International Conference on\nComputer Aided Veri\ufb01cation, pages 97\u2013117. Springer, 2017.\n\n[19] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.\n\nTechnical report, Citeseer, 2009.\n\n[20] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale.\n\narXiv preprint arXiv:1611.01236, 2016.\n\n[21] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certi\ufb01ed\nrobustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471,\n2018.\n\n[22] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack\n\nand certi\ufb01able robustness. arXiv preprint arXiv:1809.03113, 2018.\n\n[23] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks\nvia random self-ensemble. In Proceedings of the European Conference on Computer Vision\n(ECCV), pages 369\u2013385, 2018.\n\n[24] Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward\n\nrelu neural networks. arXiv preprint arXiv:1706.07351, 2017.\n\n[25] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\n[26] Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for\nprovably robust neural networks. In International Conference on Machine Learning, pages\n3575\u20133583, 2018.\n\n[27] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\nInternational Conference on Learning Representations (ICLR), arXiv preprint\n\nexamples.\narXiv:1801.09344, 2018.\n\n[28] Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semide\ufb01nite relaxations for certifying\nrobustness to adversarial examples. In Advances in Neural Information Processing Systems,\npages 10877\u201310887, 2018.\n\n[29] J\u00e9r\u00f4me Rony, Luiz G Hafemann, Luis S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric\nGranger. Decoupling direction and norm for ef\ufb01cient gradient-based l2 adversarial attacks and\ndefenses. arXiv preprint arXiv:1811.09600, 2018.\n\n[30] Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex\nrelaxation barrier to tight robustness veri\ufb01cation of neural networks. In Advances in Neural\nInformation Processing Systems, pages 9832\u20139842, 2019.\n\n[31] Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus P\u00fcschel, and Martin Vechev. Fast\nand effective robustness certi\ufb01cation. In Advances in Neural Information Processing Systems,\npages 10825\u201310836, 2018.\n\n[32] Charles M Stein. Estimation of the mean of a multivariate normal distribution. The annals of\n\nStatistics, pages 1135\u20131151, 1981.\n\n[33] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfel-\nlow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199,\n2013.\n\n[34] Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with\nmixed integer programming. In International Conference on Learning Representations, 2019.\nURL https://openreview.net/forum?id=HyGIdiRqtm.\n\n11\n\n\f[35] Jonathan Uesato, Brendan O\u2019Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial\nrisk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.\n\n[36] Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of\n\nformally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.\n\n[37] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Ef\ufb01cient formal\nsafety analysis of neural networks. In Advances in Neural Information Processing Systems,\npages 6369\u20136379, 2018.\n\n[38] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning,\nInderjit S Dhillon, and Luca Daniel. Towards fast computation of certi\ufb01ed robustness for ReLU\nnetworks. In International Conference on Machine Learning, 2018.\n\n[39] Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex\nouter adversarial polytope. In International Conference on Machine Learning (ICML), pages\n5283\u20135292, 2018.\n\n[40] Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial\n\ndefenses. Advances in Neural Information Processing Systems (NIPS), 2018.\n\n[41] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Ef\ufb01cient neural\nIn Advances in Neural\n\nnetwork robustness certi\ufb01cation with general activation functions.\nInformation Processing Systems, pages 4939\u20134948, 2018.\n\n12\n\n\f", "award": [], "sourceid": 6030, "authors": [{"given_name": "Hadi", "family_name": "Salman", "institution": "Microsoft Research AI"}, {"given_name": "Jerry", "family_name": "Li", "institution": "Microsoft"}, {"given_name": "Ilya", "family_name": "Razenshteyn", "institution": "Microsoft Research"}, {"given_name": "Pengchuan", "family_name": "Zhang", "institution": "Microsoft Research"}, {"given_name": "Huan", "family_name": "Zhang", "institution": "Microsoft Research AI"}, {"given_name": "Sebastien", "family_name": "Bubeck", "institution": "Microsoft Research"}, {"given_name": "Greg", "family_name": "Yang", "institution": "Microsoft Research"}]}