{"title": "Certified Adversarial Robustness with Additive Noise", "book": "Advances in Neural Information Processing Systems", "page_first": 9464, "page_last": 9474, "abstract": "The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defense models has been developed, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds. Our evaluation on MNIST, CIFAR-10 and ImageNet suggests that our method is scalable to complicated models and large data sets, while providing competitive robustness to state-of-the-art provable defense methods.", "full_text": "9\n1\n0\n2\n \nv\no\nN\n0\n1\n\n \n\n \n \n]\n\nG\nL\n.\ns\nc\n[\n \n \n\n6\nv\n3\n1\n1\n3\n0\n\n.\n\n9\n0\n8\n1\n:\nv\ni\nX\nr\na\n\nCerti\ufb01ed Adversarial Robustness\n\nwith Additive Noise\n\nBai Li\n\nDepartment of Statistical Science\n\nDuke University\n\nbai.li@duke.edu\n\nChangyou Chen\nDepartment of CSE\n\nUniversity at Buffalo, SUNY\n\ncchangyou@gmail.com\n\nWenlin Wang\n\nDepartment of ECE\n\nDuke University\n\nwenlin.wang@duke.edu\n\nLawrence Carin\nDepartment of ECE\n\nDuke University\n\nlcarin@duke.edu\n\nAbstract\n\nThe existence of adversarial data examples has drawn signi\ufb01cant attention in the\ndeep-learning community; such data are seemingly minimally perturbed relative to\nthe original data, but lead to very different outputs from a deep-learning algorithm.\nAlthough a signi\ufb01cant body of work on developing defensive models has been\nconsidered, most such models are heuristic and are often vulnerable to adaptive\nattacks. Defensive methods that provide theoretical robustness guarantees have\nbeen studied intensively, yet most fail to obtain non-trivial robustness when a\nlarge-scale model and data are present. To address these limitations, we introduce\na framework that is scalable and provides certi\ufb01ed bounds on the norm of the input\nmanipulation for constructing adversarial examples. We establish a connection\nbetween robustness against adversarial perturbation and additive random noise, and\npropose a training strategy that can signi\ufb01cantly improve the certi\ufb01ed bounds. Our\nevaluation on MNIST, CIFAR-10 and ImageNet suggests that the proposed method\nis scalable to complicated models and large data sets, while providing competitive\nrobustness to state-of-the-art provable defense methods.\n\n1\n\nIntroduction\n\nAlthough deep neural networks have achieved signi\ufb01cant success on a variety of challenging machine\nlearning tasks, including state-of-the-art accuracy on large-scale image classi\ufb01cation [1, 2], the\ndiscovery of adversarial examples [3] has drawn attention and raised concerns. Adversarial examples\nare carefully perturbed versions of the original data that successfully fool a classi\ufb01er. In the image\ndomain, for example, adversarial examples are images that have no visual difference from natural\nimages, but that lead to different classi\ufb01cation results [4].\nA large body of work has been developed on defensive methods to tackle adversarial examples, yet\nmost remain vulnerable to adaptive attacks [3\u201310]. A major drawback of many defensive models is\nthat they are heuristic and fail to obtain a theoretically justi\ufb01ed guarantee of robustness. On the other\nhand, many works have focused on providing provable/certi\ufb01ed robustness of deep neural networks\n[11\u201317].\nRecently, [18] provided theoretical insight on certi\ufb01ed robust prediction, building a connection\nbetween differential privacy and model robustness. It was shown that adding properly chosen noise\nto the classi\ufb01er will lead to certi\ufb01ed robust prediction. Building on ideas in [18], we conduct an\nanalysis of model robustness based on R\u00e9nyi divergence [19] between the outputs of models for\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fnatural and adversarial examples when random noise is added, and show a higher upper bound on\nthe tolerable size of perturbations compared to [18]. In addition, our analysis naturally leads to a\nconnection between adversarial defense and robustness to random noise. Based on this, we introduce\na comprehensive framework that incorporates stability training for additive random noise, to improve\nclassi\ufb01cation accuracy and hence the certi\ufb01ed bound. Our contributions are as follows:\n\n\u2022 We derive a certi\ufb01ed bound for robustness to adversarial attacks, applicable to models with\ngeneral activation functions and network structures. Speci\ufb01cally, according to [20], the\nderived bound for \ufffd1 perturbation is tight in the binary case.\n\n\u2022 Using our derived bound, we establish a strong connection between robustness to adversarial\nperturbations and additive random noise. We propose a new training strategy that accounts\nfor this connection. The new training strategy leads to signi\ufb01cant improvement on the\ncerti\ufb01ed bounds.\n\n\u2022 We conduct a comprehensive set of experiments to evaluate both the theoretical and empirical\n\nperformance of our methods, with results that are competitive with the state of the art.\n\n2 Background and Related Work\n\nMuch research has focused on providing provable/certi\ufb01ed robustness for neural networks. One\nline of such studies considers distributionally robust optimization, which aims to provide robustness\nto changes of the data-generating distribution. For example, [21] study robust optimization over a\n\u03c6-divergence ball of a nominal distribution. A robustness with respect to the Wasserstein distance\nbetween natural and adversarial distributions was provided in [22]. One limitation of distributional\nrobustness is that the divergence between distributions is rarely used as an empirical measure of\nstrength of adversarial attacks.\nAlternatively, studies have attempted to provide a certi\ufb01ed bound of minimum distortion. In [11] a\ncerti\ufb01ed bound is derived with a small loss in accuracy for robustness to \ufffd2 perturbations in two-layer\nnetworks. A method based on semi-de\ufb01nite relaxation is proposed in [12] for calculating a certi\ufb01ed\nbound, yet their analysis cannot be applied to networks with more than one hidden layer. A robust\noptimization procedure is developed in [13] by considering a convex outer approximation of the set\nof activation functions reachable through a norm-bounded perturbation. Their analysis, however, is\nstill limited to ReLU networks and pure feedforward networks. Algorithms to ef\ufb01ciently compute a\ncerti\ufb01ed bound are considered in [14] by utilizing the property of ReLU networks as well. Recently,\ntheir idea has been extended by [15] and relaxed to general activation functions. However, both of\ntheir analyses only apply to the multi-layer perceptron (MLP), limiting the application of their results.\nIn general, most analyses for certi\ufb01ed bounds rely on the properties of speci\ufb01c activation functions\nor model structures, and are dif\ufb01cult to scale. Several works aim to generalize their analysis to ac-\ncommodate \ufb02exible model structures and large-scale data. For example, [16] formed an optimization\nproblem to obtain an upper bound via Lagrangian relaxation. They successfully obtained the \ufb01rst\nnon-trivial bound for the CIFAR-10 data set. The analysis of [13] was improved in [17], where it was\nscaled up to large neural networks with general activation functions, obtaining state-of-the-art results\non MNIST and CIFAR-10. Certi\ufb01ed robustness is obtained by [18] by analyzing the connection\nbetween adversarial robustness and differential privacy. Similar to [16, 17], their certi\ufb01ed bound is\nagnostic to model structure and is scalable, but it is loose and is not comparable to [16, 17]. Our\napproach maintains all the advantages of [18], and signi\ufb01cantly improves the certi\ufb01ed bound with\nmore advanced analysis.\nThe connection between adversarial robustness and robustness to added random noise has been\nstudied in several works. In [23] this connection is established by exploring the curvature of the\nclassi\ufb01er decision boundary. Later, [24] showed adversarial robustness requires reducing error rates\nto essentially zero under large additive noise. While most previous works use concentration of\nmeasure as their analysis tool, we approach such connection from a different perspective using R\u00e9nyi\ndivergence [19]; we illustrate the connection to robustness to random noise in a more direct manner.\nMore importantly, our analysis suggests improving robustness to additive Gaussian noise can directly\nresult in the improvement of the certi\ufb01ed bound.\n\n2\n\n\f3 Preliminaries\n\n3.1 Notation\nWe consider the task of image classi\ufb01cation. Natural images are represented by x \u2208 X \ufffd [0, 1]h\u00d7w\u00d7c,\nwhere X represents the image space, with h, w and c denoting the height, width, and number of\nchannels of an image, respectively. An image classi\ufb01er over k classes is considered as a function\nf : X \u2192 {1, . . . , k}. We only consider classi\ufb01ers constructed by deep neural networks (DNNs). To\npresent our framework, we de\ufb01ne a stochastic classi\ufb01er, a function f over x with output f (x) being a\nmultinomial distribution over {1, . . . , k}, i.e., P (f (x) = i) = pi for\ufffdi pi = 1. One can classify x\n\nby picking argmaxi pi. Note this distribution is different from the one generated from softmax.\n\n3.2 R\u00e9nyi Divergence\n\nOur theoretical result depends on the R\u00e9nyi divergence, de\ufb01ned in the following [19].\nDe\ufb01nition 1 (R\u00e9nyi Divergence) For two probability distributions P and Q over R, the R\u00e9nyi\ndivergence of order \u03b1 > 1 is\n\nD\u03b1(P\ufffdQ) =\n\n1\n\n\u03b1 \u2212 1\n\nQ\ufffd\u03b1\nlog Ex\u223cQ\ufffd P\n\n(1)\n\n3.3 Adversarial Examples\nGiven a classi\ufb01er f : X \u2192 {1, . . . , k} for an image x \u2208 X , an adversarial example x\ufffd satis\ufb01es\nD(x, x\ufffd) < \ufffd for some small \ufffd > 0, and f (x) \ufffd= f (x\ufffd), where D(\u00b7,\u00b7) is some distance metric, i.e.,\nx\ufffd is close to x but yields a different classi\ufb01cation result. The distance is often described in terms\nof an \ufffdp metric, and in most of the literature \ufffd2 and \ufffd\u221e metrics are considered. In our development,\nwe focus on the \ufffd2 metric but also provide experimental results for \ufffd\u221e. More general de\ufb01nitions\nof adversarial examples are considered in some works [25], but we only address norm-bounded\nadversarial examples in this paper.\n\n3.4 Adversarial Defense\n\nClassi\ufb01cation models that are robust to adversarial examples are referred to as adversarial defense\nmodels. We introduce the most advanced defense models in two categories.\nEmpirically, the most successful defense model is based on adversarial training [4, 26], that is aug-\nmenting adversarial examples during training to help improve model robustness. TRadeoff-inspired\nAdversarial DEfense via Surrogate-loss minimization (TRADES) [27] is a variety of adversarial\ntraining that introduces a regularization term for adversarial examples:\n\nL = L(f (x), y) + \u03b3L(f (x, f (xadv))\n\nwhere L(\u00b7,\u00b7) is the cross-entropy loss. This defense model won 1st place in the NeurIPS 2018\nAdversarial Vision Challenge (Robust Model Track) and has shown better performance compared to\nprevious models [26].\nOn the other hand, the state-of-the-art approach for provable robustness is proposed by [17], where\na dual network is considered for computing a bound for adversarial perturbation using linear-\nprogramming (LP), as in [13]. They optimize the bound during training to achieve strong provable\nrobustness.\nAlthough empirical robustness and provable robustness are often considered as orthogonal research\ndirections, we propose an approach that provides both. In our experiments, presented in Sec. 6, we\nshow our approach is competitive with both of the aforementioned methods.\n\n4 Certi\ufb01ed Robust Classi\ufb01er\n\nWe propose a framework that yields an upper bound on the tolerable size of attacks, enabling certi\ufb01ed\nrobustness on a classi\ufb01er. Intuitively, our approach adds random noise to pixels of adversarial\nexamples before classi\ufb01cation, to eliminate the effects of adversarial perturbations.\n\n3\n\n\fiterations n (n = 1 is suf\ufb01cient if only the robust classi\ufb01cation c is desired).\n\nAlgorithm 1 Certi\ufb01ed Robust Classi\ufb01er\nRequire: An input image x; A standard deviation \u03c3 > 0; A classi\ufb01er f over {1, . . . , k}; Number of\n1: Set i = 1.\n2: for i \u2208 [n] do\n3:\n\nAdd i.i.d. Gaussian noise N (0, \u03c32) to each pixel of x and apply the classi\ufb01er f on it. Let the\noutput be ci = f (x + N (0, \u03c32I)).\n\n4: end for\n5: Estimate the distribution of the output as pj = #{ci=j:i=1,...,n}\n6: Calculate the upper bound:\n\nn\n\n.\n\n\u03b1>1\ufffd\u2212\n\nL = sup\n\n2\u03c32\n\u03b1\n\nlog\ufffd1 \u2212 p(1) \u2212 p(2) + 2\ufffd 1\n\n2\ufffdp1\u2212\u03b1\n\nwhere p(1) and p(2) are the \ufb01rst and the second largest values in p1, . . . , pk.\n\n7: Return classi\ufb01cation result c = argmaxi pi and the tolerable size of the attack L.\n\n(1) + p1\u2212\u03b1\n\n(2) \ufffd\ufffd 1\n\n1\u2212\u03b1\ufffd\ufffd1/2\n\nOur approach is summarized in Algorithm 1. In the following, we develop theory to prove the\ncerti\ufb01ed robustness of the proposed algorithm. Our goal is to show that if the classi\ufb01cation of x in\nAlgorithm 1 is in class c, then for any examples x\ufffd such that \ufffdx \u2212 x\ufffd\ufffd2 \u2264 L, the classi\ufb01cation of x\ufffd\nis also in class c.\nTo prove our claim, \ufb01rst recall that a stochastic classi\ufb01er f over {1, . . . , k} has an output f (x)\ncorresponding to a multinomial distribution over {1, . . . , k}, with probabilities as (p1, . . . , pk). In this\ncontext, robustness to an adversarial example x\ufffd generated from x means argmaxi pi = argmaxj p\ufffdj\nwith P (f (x) = i) = pi and P (f (x\ufffd) = j) = p\ufffdj, where P (\u00b7) denotes the probability of an event. In\nthe remainder of this section, we show Algorithm 1 achieves such robustness based on the R\u00e9nyi\ndivergence, starting with the following lemma.\n\nLemma 1 Let P = (p1, . . . , pk) and Q = (q1, . . . , qk) be two multinomial distributions over the\nsame index set {1, . . . , k}. If the indices of the largest probabilities do not match on P and Q, that is\nargmaxi pi \ufffd= argmaxj qj, then\n\nD\u03b1(Q\ufffdP ) \u2265 \u2212 log\ufffd1 \u2212 p(1) \u2212 p(2) + 2\ufffd 1\n\n2\ufffdp1\u2212\u03b1\n\n(1) + p1\u2212\u03b1\n\n1\u2212\u03b1\ufffd\n(2) \ufffd\ufffd 1\n\nwhere p(1) and p(2) are the largest and the second largest probabilities among the set of all pi.\n\nn\ufffdn\n\ni=1 xp\n\nTo simplify notation, we de\ufb01ne Mp(x1, . . . , xn) = ( 1\nized mean.\n\ni )1/p as the general-\nThe right hand side (RHS) of the condition in Lemma 1 then becomes\n\nLemma 1 proposes a lower bound of the R\u00e9nyi divergence for changing the index of the maximum of\n\nthe index of the maximum of P and Q must be the same. Based on Lemma 1, we obtain our main\ntheorem on certi\ufb01ed robustness as follows, validating our claim.\n\n\u2212 log\ufffd1 \u2212 2M1\ufffdp(1), p(2)\ufffd + 2M1\u2212\u03b1\ufffdp(1), p(2)\ufffd\ufffd.\nP , i.e., for any distribution Q, if D\u03b1(Q\ufffdP ) < \u2212 log\ufffd1 \u2212 2M1\ufffdp(1), p(2)\ufffd + 2M1\u2212\u03b1\ufffdp(1), p(2)\ufffd\ufffd,\nTheorem 2 Suppose we have x \u2208 X , and a potential adversarial example x\ufffd \u2208 X such that\n\ufffdx\u2212 x\ufffd\ufffd2 \u2264 L. Given a k-classi\ufb01er f : X \u2192 {1, . . . , k}, let f (x + N (0, \u03c32I)) \u223c (p1, . . . , pk) and\nf (x\ufffd + N (0, \u03c32I)) \u223c (p\ufffd1, . . . , p\ufffdk).\nIf the following condition is satis\ufb01ed, with p(1) and p(2) being the \ufb01rst and second largest probabilities\nin {pi}:\n\n\u03b1>1\u2212\nsup\n\n2\u03c32\n\u03b1\n\nlog\ufffd1 \u2212 2M1\ufffdp(1), p(2)\ufffd + 2M1\u2212\u03b1\ufffdp(1), p(2)\ufffd\ufffd \u2265 L2\n\nthen argmaxi pi = argmaxj p\ufffdj\n\n4\n\n\fThe conclusion of Theorem 2 can be extended to the \ufffd1 case by replacing Gaussian with Laplacian\nnoise. Speci\ufb01cally, notice the Renyi divergence between two Laplacian distribution \u039b(x, \u03bb) and\n\u039b(x\ufffd, \u03bb) is\n\n\ufffd +\n\n\u03b1 \u2212 1\n2\u03b1 \u2212 1\n\n\u03bb\n\n1\n\n\u03b1 \u2212 1\n\nlog\ufffd \u03b1\n\n2\u03b1 \u2212 1\n\nexp\ufffd\u2212\n\nexp\ufffd (\u03b1 \u2212 1)\ufffdx \u2212 x\ufffd\ufffd1\n\n\ufffd\ufffd \u03b1\u2192\u221e\u2212\u2212\u2212\u2212\u2192 \ufffdx \u2212 x\ufffd\ufffd1\nMeanwhile, \u2212 log\ufffd1 \u2212 2M1\ufffdp(1), p(2)\ufffd + 2M1\u2212\u03b1\ufffdp(1), p(2)\ufffd\ufffd \u03b1\u2192\u221e\u2212\u2212\u2212\u2212\u2192 \u2212 log(1\u2212 p(1) + p(2)), thus\nwe have the upper bound for the \ufffd1 perturbation:\nTheorem 3 In the same setting as in Theorem 2, with \ufffdx \u2212 x\ufffd\ufffd1 \u2264 L, let f (x + \u039b(0, \u03bb)) \u223c\n(p1, . . . , pk) and f (x\ufffd + \u039b(0, \u03bb)) \u223c (p\ufffd1, . . . , p\ufffdk). If \u2212\u03bb log(1 \u2212 p(1) + p(2)) \u2265 L is satis\ufb01ed, then\nargmaxi pi = argmaxj p\ufffdj.\n\n\u03b1\ufffdx \u2212 x\ufffd\ufffd1\n\n\u03bb\n\n\u03bb\n\nIn the rest of this paper, we focus on the \ufffd2 norm with Gaussian noise, but most conclusions are\nalso applicable to \ufffd1 norm with Laplacian noise. A more comprehensive analysis for \ufffd1 norm can be\nfound in [20]. Interestingly, they have proved that the bound \u2212\u03bb log(1 \u2212 p(1) + p(2)) is tight in the\nbinary case for \ufffd1 norm [20].\nWith Theorem 2, we can enable certi\ufb01ed \ufffd2 (\ufffd1) robustness on any classi\ufb01er f by adding i.i.d.\nGaussian (Laplacian) noise to pixels of inputs during testing, as done in Algorithm 1. It provides an\nupper bound for the tolerable size of attacks for a classi\ufb01er, i.e., as long as the pertubation size is\nless than the upper bound (the \u201csup\u201d part in Theorem 2), any adversarial sample can be classi\ufb01ed\ncorrectly.\n\nCon\ufb01dence Interval and Sample Size\nIn practice we can only estimate p(1) and p(2) from samples,\nthus the obtained lower bound is not precise and requires adjustment. Note that (p1, . . . , pk) forms a\nmultinomial distribution, and therefore the con\ufb01dence intervals for p(1) and p(2) can be estimated\nusing one-sided Clopper-Pearson interval along with Bonferroni correction. We refer to [18] for\nfurther details. In all our subsequent experiments, we use the end points (lower for p(1) and upper\nfor p(2)) of the 95% con\ufb01dence intervals for estimating p(1) and p(2), and multiply 95% for the\ncorresponding accuracy. Moreover, the estimates for the con\ufb01dence intervals are more precise when\nwe increase the sample size n, but at the cost of extra computational burden. In practice, we \ufb01nd a\nsample size of n = 100 is suf\ufb01cient.\n\nChoice of \u03c3 The formula of our lower bound indicates a higher standard deviation \u03c3 results in a\nhigher bound. In practice, however, a larger amount of added noise also leads to higher classi\ufb01cation\nerror and a larger gap between p(1) and p(2), which gives a lower bound. Therefore, the best choice\nof \u03c32 is not obvious. We will demonstrate the effect of different choices of \u03c32 in the experiments of\nSec. 6.\n\n5\n\nImproved Certi\ufb01ed Robustness\n\nBased on the property of generalized mean, one can show that the upper bound is larger when\nthe difference between p(1) and p(2) becomes larger. This is consistent with the intuition that a\nlarger difference between p(1) and p(2) indicates more con\ufb01dent classi\ufb01cation. In other words, more\ncon\ufb01dent and accurate prediction in the presence of additive Gaussian noise, in the sense that p(1)\nis much larger than p(2), leads to better certi\ufb01ed robustness. To this end, a connection between\nrobustness to adversarial examples and robustness to added random noise has been established by our\nanalysis.\nSuch a connection is bene\ufb01cial, because robustness to additive Gaussian noise is much easier to\nachieve than robustness to carefully crafted adversarial examples. Consequently, it is natural to\nconsider improving the adversarial robustness of a model by \ufb01rst improving its robustness to added\nrandom noise. In the context of Algorithm 1, we aim to improve the robustness of f to additive\nrandom noise. Note improving robustness to added Gaussian noise as a way of improving adversarial\nrobustness has been proposed by [28] and was later shown ineffective [29]. Our method is different\n\n5\n\n\fin that it requires added Gaussian noise during the testing phase, and more importantly it is supported\ntheoretically.\nThere have been notable efforts at developing neural networks that are robust to added random noise\n[30, 31]; yet, these methods failed to defend against adversarial attacks, as they are not particularly\ndesigned for this task. Within our framework, since Algorithm 1 has no constraint on the classi\ufb01er\nf, which gives the \ufb02exibility to modify f, we can adapt these methods to improve the accuracy of\nclassi\ufb01cation when Gaussian noise is present, hence improving the robustness to adversarial attacks.\nIn this paper, we only discuss stability training, but a much wider scope of literature exists for\nrobustness to added random noise [30, 31].\n\n5.1 Stability Training\n\nThe idea of introducing perturbations during training to improve model robustness has been studied\nwidely. In [32] the authors considered perturbing models as a construction of pseudo-ensembles,\nto improve semi-supervised learning. More recently, [33] used a similar training strategy, named\nstability training, to improve classi\ufb01cation robustness on noisy images.\nFor any natural image x, stability training encourages its perturbed version x\ufffd to yield a similar classi-\n\ufb01cation result under a classi\ufb01er f, i.e., D(f (x), f (x\ufffd)) is small for some distance measure D. Speci\ufb01-\ncally, given a loss function L\u2217 for the original classi\ufb01cation task, stability training introduces a regular-\nization term L(x, x\ufffd) = L\u2217+\u03b3Lstability(x, x\ufffd) = L\u2217+\u03b3D(f (x), f (x\ufffd)), where \u03b3 controls the strength\nof the stability term. As we are interested in a classi\ufb01cation task, we use cross-entropy as the distance\nD between f (x) and f (x\ufffd), yielding the stability loss Lstability = \u2212\ufffdj P (yj|x) log P (yj|x\ufffd), where\nP (yj|x) and P (yj|x\ufffd) are the probabilities generated after softmax. In this paper, we add i.i.d.\nGaussian noise to each pixel of x to construct x\ufffd, as suggested in [33].\nStability training is in the same spirit as adversarial training, but is only designed to improve the\nclassi\ufb01cation accuracy under a Gaussian perturbation. Within our framework, we can apply stability\ntraining to f, to improve the robustness of Algorithm 1 against adversarial perturbations. We call the\nresulting defense method Stability Training with Noise (STN).\n\nAdversarial Logit Pairing Adversarial Logit Pairing (ALP) was proposed in [34];\nit adds\nD(f (x), f (x\ufffd)) as the regularizer, with x\ufffd being an adversarial example. Subsequent work has\nshown ALP fails to obtain adversarial robustness [35]. Our method is different from ALP and any\nother regularizer-based approach, as the key component in our framework is the added Gaussian\nnoise at the testing phase of Algorithm 1, while stability training is merely a technique for improving\nthe robustness further. We do not claim stability training alone yields adversarial robustness.\n\n6 Experiments\n\nWe perform experiments on the MNIST and CIFAR-10 data sets, to evaluate the theoretical and\nempirical performance of our methods. We subsequently also consider the larger ImageNet dataset.\nFor the MNIST data set, the model architecture follows the models used in [36], which contains two\nconvolutional layers, each containing 64 \ufb01lters, followed with a fully connected layer of size 128.\nFor the CIFAR-10 dataset, we use a convolutional neural network with seven convolutional layers\nalong with MaxPooling. In both datasets, image intensities are scaled to [0, 1], and the size of attacks\nare also rescaled accordingly. For reference, a distortion of 0.031 in the [0, 1] scale corresponds to 8\nin [0, 255] scale. The source code can be found at https://github.com/Bai-Li/STN-Code.\n\n6.1 Theoretical Bound\n\nWith Algorithm 1, we are able to classify a natural image x and calculate an upper bound for the\ntolerable size of attacks L for this particular image. Thus, with a given size of the attack L\u2217, the\nclassi\ufb01cation must be robust if L\u2217 < L. If a natural example is correctly classi\ufb01ed and robust for\nL\u2217 simultaneously, any adversarial examples x\ufffd with \ufffdx \u2212 x\ufffd\ufffd2 < L\u2217 will be classi\ufb01ed correctly.\nTherefore, we can determine the proportion of such examples in the test set as a lower bound of\naccuracy given size L\u2217.\n\n6\n\n\fWe plot in Figure 1 different lower bounds for various choices of \u03c3 and L\u2217, for both MNIST and\nCIFAR-10. To interpret the results, for example on MNIST, when \u03c3 = 0.7, Algorithm 1 achieves at\nleast 51% accuracy under any attack whose \ufffd2-norm size is smaller than 1.4.\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\nL \ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\nL \ufffd\n\nFigure 1: Accuracy lower bounds for MNIST (left) and CIFAR-10 (right). We test various choices\nof \u03c3 in Algorithm 1. For reference, we include results for PixelDP (green) and the lower bound\nwithout stability training (orange).\n\nWhen \u03c3 is \ufb01xed, there exists a threshold beyond which the certi\ufb01ed lower bound degenerates to zero.\nThe larger the deviation \u03c3 is, the later the degeneration happens. However, larger \u03c3 also leads to a\nworse bound when L\u2217 is small, as the added noise reduces the accuracy of classi\ufb01cation.\nAs an ablation study, we include the corresponding lower bound without stability training. The\nimprovement due to stability training is signi\ufb01cant. In addition, to demonstrate the improvement\nof the certi\ufb01ed bound compared to PixelDP [18], we also include the results for PixelDP. Although\nPixelDP also has tuning parameters \u03b4 and \ufffd similar to \u03c3 in our setting, we only include the optimal\npair of parameters found by grid search, for simplicity of the plots. One observes the accuracy lower\nbounds for PixelDP (green) are dominated by our bounds.\nWe also compare STN with the approach from [17]. Besides training a single robust classi\ufb01er, [17]\nalso proposed a strategy of training a sequence of robust classi\ufb01ers as a cascade model which results\nin better provable robustness, although it reduces the accuracy on natural examples. We compare\nboth to STN in Table 1. Since both methods show a clear trade-off between the certi\ufb01ed bound and\ncorresponding robustness accuracy, we include the certi\ufb01ed bounds and corresponding accuracy in\nparenthesis for both models, along with the accuracy on natural examples.\n\nTable 1: Comparison on MNIST and CIFAR-10. The numbers \u201ca (b%)\u201d mean a certi\ufb01ed bound a\nwith the corresponding accuracy b%.\n\nMNIST\n\nCIFAR-10\n\nModel\n\n[17] (Single)\n[17] (Cascade)\n\nSTN\n\nRobust Accuracy Natural Robust Accuracy Natural\n61.2%\n68.8%\n80.5%\n\n88.2%\n36.00 (53.0%)\n81.4%\n36.00 (58.7%)\n98.9% 36.00 (65.6%)\n\n1.58 (43.5%)\n1.58 (74.6%)\n1.58 (69.0%)\n\nOur bound is close to the one from [17] on MNIST, and becomes\nbetter on CIFAR-10. In addition, since the training objective of\n[17] is particularly designed for provable robustness and depends on\nthe size of attacks, its accuracy on the natural examples decreases\ndrastically when accommodating stronger attacks. On the other hand,\nSTN is capable of keeping a high accuracy on natural examples while\nproviding strong robustness guarantees.\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\ufffd\n\nCerti\ufb01ed Robustness on ImageNet As our framework adds al-\nmost no extra computational burden on the training procedure, we\nare able to compute accuracy lower bounds for ImageNet [37], a\nlarge-scale image dataset that contains over 1 million images and\n1,000 classes. We compare our bound with PixelDP in Figure 2.\nClearly, our bound is higher than the one obtained via PixelDP.\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\n\nL\n\nFigure 2: Comparison of the\ncerti\ufb01ed bound from STN (or-\nange) and PixelDP (blue) on\nImageNet.\n\n7\n\n\f6.2 Empirical Results\n\nWe next perform classi\ufb01cation and measure the accuracy on real adversarial examples to evaluate\nthe empirical performance of our defense methods. For each pair of attacks and defense models, we\ngenerate a robust accuracy vs. perturbation size curve for a comprehensive evaluation. We compare\nour method to TRADES on MNIST and CIFAR-10. Although we have emphasized the theoretical\nbound of the defense, the empirical performance is promising as well. More details and results of the\nexperiments, such as for gradient-free attacks, are included in the Appendix.\n\nAvoid Gradient Masking A defensive model incorporating randomness may make it dif\ufb01cult to\napply standard attacks, by causing gradient masking as discussed in [10], thus achieving robustness\nunfairly. To ensure the robustness of our approach is not due to gradient masking, we use the\nexpectation of the gradient with respect to the randomization when estimating gradients, to ensemble\nover randomization and eliminate the effect of randomness, as recommended in [10, 38]. In particular,\ni=1 [\u2207x+ri L(\u03b8, x + ri, y)],\nthe gradient is estimated as Er\u223cN (0,\u03c32I) [\u2207x+rL(\u03b8, x + r, y)] \u2248 1\nwhere ri\u2019s are i.i.d. samples from N (0, \u03c32I) distribution, and n0 is the number of samples. We\nassume threat models are aware of the value of \u03c3 in Algorithm 1 and use the same value for attacks.\n\nn0\ufffdn0\n\nWhite-box Attacks For \ufffd\u221e attacks, we use Projected Gradient Descent (PGD) attacks [26]. It\nconstructs an adversarial example by iteratively updating the natural input along with the sign of\nits gradient and projecting it into the constrained space, to ensure its a valid input. For \ufffd2 attacks,\nwe perform a Carlini & Wagner attack [8], which constructs an adversarial example via solving\nan optimization problem for minimizing distortion distance and maximizing classi\ufb01cation error.\nWe also use technique from [39], that has been shown to be more effective against adversarially\ntrained model, where the gradients are estimated as the average of gradients of multiple randomly\nperturbed samples. This brings a variant of Carlini & Wagner attack with the same form as the\nensemble-over-randomization mentioned above, therefore it is even more fair to use it. The results\nfor white-box attacks are illustrated in Figure 3.\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\n\n\ufffd\n\n\ufffd\n\ufffd\n\n\ufffd\n\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\n\n\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\nFigure 3: MNIST and CIFAR-10: Comparisons of the adversarial robustness of TRADES and STN\nwith various attack sizes for both \ufffd2 and \ufffd\u221e. The plots are ordered as: MNIST(\ufffd2), MNIST(\ufffd\u221e),\nCIFAR-10(\ufffd2), and CIFAR-10(\ufffd\u221e). Both white-box (straight lines) and black-box attacks (dash lines)\nare considered.\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\u03c3 \ufffd \ufffd\ufffd\ufffd\n\ufffd\ufffd\ufffd\n\n\ufffd\n\n\ufffd\n\n\ufffd\n\nL\n\n\ufffd\n\n\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\n\ufffd\ufffd\ufffd\ufffd\n\nL\n\nL\n\nL\n\nFigure 4: Robust accuracy of STN with different choices of \u03c3 for both \ufffd2 and \ufffd\u221e attacks. The plots\nare ordered as: MNIST(\ufffd2), MNIST(\ufffd\u221e), CIFAR-10(\ufffd2), and CIFAR-10(\ufffd\u221e).\n\nBlack-box Attacks To better understand the behavior of our methods and to further ensure there\nis no gradient masking, we include results for black-box attacks. After comparison, we realize the\nadversarial examples generated from Madry\u2019s model [26] result in the strongest black-box attacks for\nboth TRADES and STN. Therefore, we apply the \ufffd2 and \ufffd\u221e white-box attacks to Madry\u2019s model\nand test the resulting adversarial examples on TRADES and STN. The results are reported as the\ndashlines in Figure 3.\n\n8\n\n\fSummary of Results Overall, STN shows a promising level of robustness, especially regarding\n\ufffd2-bounded distortions, as anticipated. One observes that STN performs slightly worse than TRADES\nwhen the size of attacks is small, and becomes better when the size increases. Intuitively, the added\nrandom noise dominantly reduces the accuracy for small attack size and becomes bene\ufb01cial against\nstronger attacks. It is worth-noting that Algorithm 1 adds almost no computational burden, as it\nonly requires multiple forward passes, and stability training only requires augmenting randomly\nperturbed examples. On the other hand, TRADES is extremely time-consuming, due to the iterative\nconstruction of adversarial examples.\n\nChoice of \u03c3 In previous experiments, we use \u03c3 = 0.7 and \u03c3 = 100\n255 for MNIST and CIFAR-10,\nrespectively. However, the choice of \u03c3 plays an important role, as shown in Figure 1; therefore, we\nstudy in Figure 4 how different values of \u03c3 affect the empirical robust accuracy. The results make\nit more clear that the noise hurts when small attacks are considered, but helps against large attacks.\nIdeally, using an adaptive amount of noise, that lets the amount of added noise grow with the size of\nattacks, could lead to better empirical results, yet it is practically impossible as the size of attacks\nis unknown beforehand. In addition, we include results for \u03c3 = 0, which is equivalent to a model\nwithout additive Gaussian noise. Its vulnerability indicates the essential of our framework is the\nadditive Gaussian noise.\n\n7 Comparison to [40]\n\nFollowing this work, [40] proposed a tighter bound in \ufffd2 norm than the one in section 4. Although\nthey do indeed improve our bound, our work has unique contributions in several ways: (i) We propose\nstability training to improve the bound and robustness, while they only use Gaussian augmentation.\nIn general, stability training works better than Gaussian augmentation, as shown in Figure 1. Thus,\nstability training is an important and unique contribution of this paper. (ii) We conduct empirical\nevaluation against real attacks and compare to the state-of-the-art defense method (adversarial\ntraining) to show our approach is competitive. [40] only discusses the certi\ufb01ed bound and does not\nprovide evaluation against real attacks. (iii) The analysis from [40] is dif\ufb01cult to be extended to\nother norms, because it requires isotropy. On the other hand, ours lead to a tight certi\ufb01ed \ufffd1 bound by\nadding Laplacian noise, as discussed in Section 4.\n\n8 Conclusions\n\nWe propose an analysis for constructing defensive models with certi\ufb01ed robustness. Our analysis\nleads to a connection between robustness to adversarial attacks and robustness to additive random\nperturbations. We then propose a new strategy based on stability training for improving the robustness\nof the defense models. The experimental results show our defense model provides competitive\nprovable robustness and empirical robustness compared to the state-of-the-art models. It especially\nyields strong robustness when strong attacks are considered.\nThere is a noticeable gap between the theoretical lower bounds and the empirical accuracy, indicating\nthat the proposed upper bound might not be tight, or the empirical results should be worse for stronger\nattacks that have not been developed, as has happened to many defense models. We believe each\nexplanation points to a direction for future research.\n\nAcknowledgments This research was supported in part by DARPA, DOE, NIH, NSF and ONR.\n\nReferences\n[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual\n\nnetworks. In European conference on computer vision, pages 630\u2013645. Springer, 2016.\n\n[2] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected\nconvolutional networks. In Proceedings of the IEEE conference on computer vision and pattern\nrecognition, pages 4700\u20134708, 2017.\n\n9\n\n\f[3] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfel-\nlow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199,\n2013.\n\n[4] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversar-\n\nial examples. arXiv preprint arXiv:1412.6572, 2014.\n\n[5] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple\nand accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition, pages 2574\u20132582, 2016.\n\n[6] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation\nas a defense to adversarial perturbations against deep neural networks. In Security and Privacy\n(SP), 2016 IEEE Symposium on, pages 582\u2013597. IEEE, 2016.\n\n[7] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale.\n\narXiv preprint arXiv:1611.01236, 2016.\n\n[8] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In\n\nSecurity and Privacy (SP), 2017 IEEE Symposium on, pages 39\u201357. IEEE, 2017.\n\n[9] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks:\nReliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248,\n2017.\n\n[10] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of\nsecurity: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420,\n2018.\n\n[11] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er\nagainst adversarial manipulation. In Advances in Neural Information Processing Systems, pages\n2266\u20132276, 2017.\n\n[12] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. arXiv preprint arXiv:1801.09344, 2018.\n\n[13] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex\n\nouter adversarial polytope. arXiv preprint arXiv:1711.00851, 2017.\n\n[14] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning,\nInderjit S Dhillon, and Luca Daniel. Towards fast computation of certi\ufb01ed robustness for relu\nnetworks. arXiv preprint arXiv:1804.09699, 2018.\n\n[15] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Ef\ufb01cient neural\nIn Advances in Neural\n\nnetwork robustness certi\ufb01cation with general activation functions.\nInformation Processing Systems, pages 4944\u20134953, 2018.\n\n[16] Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli.\nA dual approach to scalable veri\ufb01cation of deep networks. arXiv preprint arXiv:1803.06567,\n2018.\n\n[17] Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial\n\ndefenses. arXiv preprint arXiv:1805.12514, 2018.\n\n[18] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certi\ufb01ed\nrobustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471,\n2018.\n\n[19] Tim Van Erven and Peter Harremos. R\u00e9nyi divergence and kullback-leibler divergence. IEEE\n\nTransactions on Information Theory, 60(7):3797\u20133820, 2014.\n\n[20] Anonymous. $\\ell_1$ adversarial robustness certi\ufb01cates: a randomized smoothing approach. In\n\nSubmitted to International Conference on Learning Representations, 2020. under review.\n\n10\n\n\f[21] Hongseok Namkoong and John C Duchi. Variance-based regularization with convex objectives.\n\nIn Advances in Neural Information Processing Systems, pages 2971\u20132980, 2017.\n\n[22] Aman Sinha, Hongseok Namkoong, and John Duchi. Certi\ufb01able distributional robustness with\n\nprincipled adversarial training. arXiv preprint arXiv:1710.10571, 2017.\n\n[23] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of\nclassi\ufb01ers: from adversarial to random noise. In Advances in Neural Information Processing\nSystems, pages 1632\u20131640, 2016.\n\n[24] Nic Ford, Justin Gilmer, Nicolas Carlini, and Dogus Cubuk. Adversarial examples are a natural\n\nconsequence of test error in noise. arXiv preprint arXiv:1901.10513, 2019.\n\n[25] T. B. Brown, N. Carlini, C. Zhang, C. Olsson, P. Christiano, and I. Goodfellow. Unrestricted\n\nadversarial examples. arXiv preprint arXiv:1809.08352, 2018.\n\n[26] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\n[27] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I\nJordan. Theoretically principled trade-off between robustness and accuracy. arXiv preprint\narXiv:1901.08573, 2019.\n\n[28] Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Ef\ufb01cient defenses against\nadversarial attacks. In Proceedings of the 10th ACM Workshop on Arti\ufb01cial Intelligence and\nSecurity, pages 39\u201349. ACM, 2017.\n\n[29] Nicholas Carlini and David Wagner. Magnet and\" ef\ufb01cient defenses against adversarial attacks\"\n\nare not robust to adversarial examples. arXiv preprint arXiv:1711.08478, 2017.\n\n[30] Junyuan Xie, Linli Xu, and Enhong Chen. Image denoising and inpainting with deep neural\n\nnetworks. In Advances in neural information processing systems, pages 341\u2013349, 2012.\n\n[31] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian\ndenoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image\nProcessing, 26(7):3142\u20133155, 2017.\n\n[32] Philip Bachman, Ouais Alsharif, and Doina Precup. Learning with pseudo-ensembles. In\n\nAdvances in Neural Information Processing Systems, pages 3365\u20133373, 2014.\n\n[33] Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of\ndeep neural networks via stability training. In Proceedings of the IEEE Conference on Computer\nVision and Pattern Recognition, pages 4480\u20134488, 2016.\n\n[34] Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint\n\narXiv:1803.06373, 2018.\n\n[35] Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understanding the robust-\n\nness of adversarial logit pairing. arXiv preprint arXiv:1807.10272, 2018.\n\n[36] Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc-\nDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204,\n2017.\n\n[37] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei.\n\nHierarchical Image Database. In CVPR09, 2009.\n\nImageNet: A Large-Scale\n\n[38] Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris\nTsipras, Ian Goodfellow, and Aleksander Madry. On evaluating adversarial robustness. arXiv\npreprint arXiv:1902.06705, 2019.\n\n[39] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. On norm-agnostic robustness of\n\nadversarial training, 2019.\n\n[40] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certi\ufb01ed adversarial robustness via\n\nrandomized smoothing. arXiv preprint arXiv:1902.02918, 2019.\n\n11\n\n\f", "award": [], "sourceid": 5038, "authors": [{"given_name": "Bai", "family_name": "Li", "institution": "Duke University"}, {"given_name": "Changyou", "family_name": "Chen", "institution": "University at Buffalo"}, {"given_name": "Wenlin", "family_name": "Wang", "institution": "Duke University"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}]}