{"title": "Improving Black-box Adversarial Attacks with a Transfer-based Prior", "book": "Advances in Neural Information Processing Systems", "page_first": 10934, "page_last": 10944, "abstract": "We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.", "full_text": "Improving Black-box Adversarial Attacks with a\n\nTransfer-based Prior\n\nShuyu Cheng\u2217, Yinpeng Dong\u2217, Tianyu Pang, Hang Su, Jun Zhu\u2020\n\nDept. of Comp. Sci. and Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys.,\n\nInstitute for AI, THBI Lab, Tsinghua University, Beijing, 100084, China\n\n{chengsy18, dyp17, pty17}@mails.tsinghua.edu.cn, {suhangss, dcszj}@mail.tsinghua.edu.cn\n\nAbstract\n\nWe consider the black-box adversarial setting, where the adversary has to gen-\nerate adversarial perturbations without access to the target models to compute\ngradients. Previous methods tried to approximate the gradient either by using a\ntransfer gradient of a surrogate white-box model, or based on the query feedback.\nHowever, these methods often suffer from low attack success rates or poor query\nef\ufb01ciency since it is non-trivial to estimate the gradient in a high-dimensional space\nwith limited information. To address these problems, we propose a prior-guided\nrandom gradient-free (P-RGF) method to improve black-box adversarial attacks,\nwhich takes the advantage of a transfer-based prior and the query information\nsimultaneously. The transfer-based prior given by the gradient of a surrogate model\nis appropriately integrated into our algorithm by an optimal coef\ufb01cient derived by\na theoretical analysis. Extensive experiments demonstrate that our method requires\nmuch fewer queries to attack black-box models with higher success rates compared\nwith the alternative state-of-the-art methods.\n\n1\n\nIntroduction\n\nAlthough deep neural networks (DNNs) have achieved signi\ufb01cant success on various tasks [12], they\nhave been shown to be vulnerable to adversarial examples [2, 33, 13], which are crafted to fool the\nmodels by modifying normal examples with human imperceptible perturbations. Many efforts have\nbeen devoted to studying the generation of adversarial examples, which is crucial to identify the\nweaknesses of deep learning algorithms [33, 1], serve as a surrogate to evaluate robustness [5], and\nconsequently contribute to the design of robust deep learning models [24].\nIn general, adversarial attacks can be categorized into white-box attacks and black-box attacks. In\nthe white-box setting, the adversary has full access to the model, and can use various gradient-based\nmethods [13, 20, 5, 24] to generate adversarial examples. In the more challenging black-box setting,\nthe adversary has no or limited knowledge about the model, and crafts adversarial examples without\nany gradient information. The black-box setting is more practical in many real-world situations.\nMany methods [30, 6, 3, 7, 18, 27, 35, 19, 9] have been proposed to perform black-box adversarial\nattacks. A common idea is to use an approximate gradient instead of the true gradient for crafting\nadversarial examples. The approximate gradient could be either the gradient of a surrogate model\n(termed as transfer-based attacks) or numerically estimated by the zeroth-order optimization methods\n(termed as query-based attacks). In transfer-based attacks, adversarial examples generated for a\ndifferent model are probable to remain adversarial for the target model due to the transferability [29].\nAlthough various methods [7, 23, 8] have been introduced to improve the transferability, the attack\nsuccess rate is still unsatisfactory. The reason is that there lacks an adjustment procedure in transfer-\nbased attacks when the gradient of the surrogate model points to a non-adversarial region of the\n\n\u2217Equal contribution. \u2020Corresponding author.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\ftarget model. In query-based adversarial attacks, the gradient can be estimated by various methods,\nsuch as \ufb01nite difference [6, 27], random gradient estimation [35], and natural evolution strategy [18].\nThese methods usually result in a higher attack success rate compared with the transfer-based attack\nmethods [6, 27], but they require a tremendous number of queries to perform a successful attack. The\ninef\ufb01ciency mainly comes from the underutilization of priors, since the current methods are nearly\noptimal to estimate the gradient [19].\nTo address the aforementioned problems and improve black-box attacks, we propose a prior-guided\nrandom gradient-free (P-RGF) method to utilize the transfer-based prior for query-ef\ufb01cient black-\nbox attacks under the gradient estimation framework. The transfer-based prior is given by the gradient\nof a surrogate white-box model, which contains abundant prior knowledge of the true gradient. Our\nmethod provides a gradient estimate by querying the target model with random samples that are biased\ntowards the transfer gradient and acquiring the corresponding loss values. We provide a theoretical\nanalysis on deriving the optimal coef\ufb01cient, which controls the strength of the transfer gradient.\nOur method is also \ufb02exible to integrate other forms of prior information. As a concrete example,\nwe incorporate the commonly used data-dependent prior [19] into our algorithm along with the\ntransfer-based prior. Extensive experiments demonstrate that our method signi\ufb01cantly outperforms\nthe previous state-of-the-art methods in terms of black-box attack success rate and query ef\ufb01ciency,\nwhich veri\ufb01es the superiority of our method for black-box adversarial attacks.\n\n2 Background\n\nIn this section, we review the background and the related work on black-box adversarial attacks.\n2.1 Adversarial setup\nGiven a classi\ufb01er C(x) and an input-label pair (x, y), the goal of attacks is to generate an adversarial\nexample xadv that is misclassi\ufb01ed while the distance between the adversarial input and the normal\ninput measured by the (cid:96)p norm is smaller than a preset threshold \u0001 as\nC(xadv) (cid:54)= y, s.t. (cid:107)xadv \u2212 x(cid:107)p \u2264 \u0001.\n\n(1)\nNote that this corresponds to the untargeted attack. We present our framework and algorithm based\non the untargeted attack for clarity, while the extension to the targeted one is straightforward.\nAn adversarial example can be generated by solving the constrained optimization problem as\n\nxadv = arg max\nx(cid:48):(cid:107)x(cid:48)\u2212x(cid:107)p\u2264\u0001\n\nf (x(cid:48), y),\n\n(2)\n\nwhere f is a loss function on top of the classi\ufb01er C(x), e.g., the cross-entropy loss. Many gradient-\nbased methods [13, 20, 5, 24] have been proposed to solve this optimization problem. The state-of-\nthe-art projected gradient descent (PGD) [24] iteratively generates adversarial examples as\n\nt + \u03b7 \u00b7 gt),\n\nt\n\nt\n\nt\n\n,y)\n,y)(cid:107)2\n\nxadv\nt+1 = \u03a0Bp(x,\u0001)(xadv\n\n(3)\nwhere \u03a0 is the projection operation, Bp(x, \u0001) is the (cid:96)p ball centered at x with radius \u0001, \u03b7 is the step\nsize, and gt is the normalized gradient under the (cid:96)p norm, e.g., gt = \u2207xf (xadv\nunder the (cid:96)2\n(cid:107)\u2207xf (xadv\nnorm, and gt = sign(\u2207xf (xadv\n, y)) under the (cid:96)\u221e norm. This method requires full access to the\ngradient of the target model, which is designed under the white-box attack setting.\n2.2 Black-box attacks\nThe direct access to the model gradient is unrealistic in many real-world applications, where we\nneed to perform attacks in the black-box manner. We can still adopt the PGD method to generate\nadversarial examples, except that the true gradient \u2207xf (x, y) is usually replaced by an approximate\ngradient. Black-box attacks can be roughly divided into transfer-based attacks and query-based attacks.\nTransfer-based attacks adopt the gradient of a surrogate white-box model to generate adversarial\nexamples, which are probable to fool the black-box model due to the transferability [30, 23, 7].\nQuery-based attacks estimate the gradient by the zeroth-order optimization methods, when the loss\nvalues could be accessed through queries. Chen et al. [6] propose to use the symmetric difference\nquotient [21] to estimate the gradient at each coordinate as\n\nf (x + \u03c3ei, y) \u2212 f (x \u2212 \u03c3ei, y)\n\n\u02c6gi =\n\n\u2248 \u2202f (x, y)\n\n\u2202xi\n\n,\n\n(4)\n\n2\u03c3\n\n2\n\n\fq(cid:88)\n\ni=1\n\n1\nq\n\nf (x + \u03c3ui, y) \u2212 f (x, y)\n\n\u00b7 ui,\n\nwhere \u03c3 is a small constant, and ei is the i-th unit basis vector. Although query-ef\ufb01cient mechanisms\nhave been developed [6, 27], the coordinate-wise gradient estimation inherently results in the query\ncomplexity being proportional to the input dimension D, which is prohibitively large with high-\ndimensional input space, e.g., D \u2248 270,000 for ImageNet [31]. To improve query ef\ufb01ciency, the\napproximated gradient \u02c6g can be estimated by the random gradient-free (RGF) method [26, 11, 10] as\n\n\u03c3\n\n\u02c6g =\n\n\u02c6gi, where \u02c6gi =\n\n(5)\ni=1 are the random vectors independently sampled from a distribution P on RD, and\nwhere {ui}q\n\u03c3 is the parameter to control the sampling variance. It is noted that \u02c6gi \u2192 u(cid:62)i \u2207xf (x, y) \u00b7 ui when\n\u03c3 \u2192 0, which is nearly an unbiased estimator of the gradient [10] when E[uiu(cid:62)i ] = I. \u02c6g is the\naverage estimation over q random directions to reduce the variance. The natural evolution strategy\n(NES) [18] is another variant of Eq. (5), which conducts the antithetic sampling over a Gaussian\ndistribution. Ilyas et al. [19] show that these methods are nearly optimal to estimate the gradient, but\ntheir query ef\ufb01ciency could be improved by incorporating informative priors. They identify the time\nand data-dependent priors for black-box attacks. Different from the alternative methods, our proposed\ntransfer-based prior is more effective as shown in the experiments. Moreover, the transfer-based\nprior can also be used together with other priors. We demonstrate the \ufb02exibility of our algorithm by\nincorporating the commonly used data-dependent prior as an example.\n\n2.3 Black-box attacks based on both transferability and queries\n\nThere are also several works that adopt both the transferability of adversarial examples and the model\nqueries for black-box attacks. Papernot et al. [30, 29] train a local substitute model to mimic the\nblack-box model with a synthetic dataset, in which the labels are given by the black-box model\nthrough queries. Then the black-box model is attacked by the adversarial examples generated for the\nsubstitute model based on the transferability. A meta-model [28] can reverse-engineer the black-box\nmodel and predict its attributes (such as architecture, optimization procedure, and training data)\nthrough a sequence of queries. Given the predicted attributes of the black-box model, the attacker\ncan \ufb01nd similar surrogate models, which are better to craft transferable adversarial examples against\nthe black-box model. These methods all use queries to obtain knowledge of the black-box model,\nand train/\ufb01nd surrogate models to generate adversarial examples, with the purpose of improving the\ntransferability. However, we do not optimize the surrogate model, but focus on utilizing the gradient\nof a \ufb01xed surrogate model to obtain a more accurate gradient estimate.\nA recent work [4] also uses the gradient of a surrogate model to improve the ef\ufb01ciency of query-based\nblack-box attacks. This method focuses on a different attack scenario, where the model only provides\nthe hard-label outputs, but we consider the setting where the loss values could be accessed. Moreover,\nthis method controls the strength of the transfer gradient by a preset hyperparameter, but we obtain\nits optimal value through a theoretical analysis based on the gradient estimation framework. It\u2019s\nworth mentioning that a similar but independent work [25] also uses surrogate gradients to improve\nzeroth-order optimization, but they did not apply their method to black-box adversarial attacks.\n\n3 Methodology\n\nIn this section, we \ufb01rst introduce the gradient estimation framework. Then we propose the prior-\nguided random gradient-free (P-RGF) algorithm. We further incorporate the data-dependent prior [19]\ninto our algorithm. We also provide an alternative algorithm for the same purpose in Appendix B.\n\n3.1 Gradient estimation framework\n\nThe key challenge in black-box adversarial attacks is to estimate the gradient of a model, which can\nbe used to conduct gradient-based attacks. In this paper, we aim to estimate the gradient \u2207xf (x, y)\nof the black-box model f more accurately to improve black-box attacks. We denote the gradient\n\u2207xf (x, y) by \u2207f (x) in the following for simplicity. We assume that \u2207f (x) (cid:54)= 0 in this paper. The\nobjective of gradient estimation is to \ufb01nd the best estimator, which approximates the true gradient\n\u2207f (x) by reaching the minimum value of the loss function as\n\n\u02c6g\u2217 = arg min\n\nL(\u02c6g),\n\n(6)\n\n\u02c6g\u2208G\n\n3\n\n\fwhere \u02c6g is a gradient estimator given by any estimation algorithm, G is the set of all possible gradient\nestimators, and L(\u02c6g) is a loss function to measure the performance of the estimator \u02c6g. Speci\ufb01cally,\nwe let the loss function of the gradient estimator \u02c6g be\n\nL(\u02c6g) = min\nb\u22650\n\nE(cid:107)\u2207f (x) \u2212 b\u02c6g(cid:107)2\n2,\n\n(7)\n\nwhere the expectation is taken over the randomness of the estimation algorithm to obtain \u02c6g. The\nloss L(\u02c6g) is the minimum expected squared (cid:96)2 distance between the true gradient \u2207f (x) and scaled\nestimator b\u02c6g. The previous work [35] also uses the expected squared (cid:96)2 distance E(cid:107)\u2207f (x) \u2212 \u02c6g(cid:107)2\n2 as\nthe loss function, which is similar to ours. However, the value of this loss function will change with\ndifferent magnitude of the estimator \u02c6g. In generating adversarial examples, the gradient is usually\nnormalized [13, 24], such that the direction of the gradient estimator, instead of the magnitude, will\naffect the performance of attacks. Thus, we incorporate a scaling factor b in Eq. (7) and minimize the\nerror w.r.t. b, which can neglect the impact of the magnitude on the loss of the estimator \u02c6g.\n\n3.2 Prior-guided random gradient-free method\n\nIn this section, we present the prior-guided random gradient-free (P-RGF) method, which is a variant\nof the random gradient-free (RGF) method. Recall that in RGF, the gradient can be estimated via\na set of random vectors {ui}q\ni=1 as in Eq. (5) with q being the number of random vectors. Directly\nusing RGF without prior information will result in poor query ef\ufb01ciency as shown in our experiments.\nIn our method, we propose to sample the random vectors that are biased towards the transfer gradient,\nto fully exploit the prior information.\nLet v be the normalized transfer gradient of a surrogate model such that (cid:107)v(cid:107)2 = 1, and the cosine\nsimilarity between the transfer gradient and the true gradient be\n\n2 \u2207f (x),\n\n\u03b1 = v(cid:62)\u2207f (x) with \u2207f (x) = (cid:107)\u2207f (x)(cid:107)\u22121\n\n(8)\nwhere \u2207f (x) is the (cid:96)2 normalization of the true gradient \u2207f (x).1 We assume that \u03b1 \u2265 0 without\nloss of generality, since we can reassign v \u2190 \u2212v when \u03b1 < 0. Although the true value of \u03b1 is\nunknown, we could estimate it ef\ufb01ciently, which will be introduced in Sec. 3.3.\nFor the RGF estimator \u02c6g in Eq. (5), we further assume that the sampling distribution P is de\ufb01ned on\nthe unit hypersphere in the D-dimensional space, such that the random vectors {ui}q\ni=1 drawn from\nP satisfy (cid:107)ui(cid:107)2 = 1. Then, we can represent the loss of the RGF estimator by the following theorem.\nTheorem 1. (Proof in Appendix A.1) If f is differentiable at x, the loss of the RGF estimator \u02c6g is\n\nL(\u02c6g) = (cid:107)\u2207f (x)(cid:107)2\n\n2 \u2212\n\nlim\n\u03c3\u21920\n\n(9)\nwhere \u03c3 is the sampling variance, C = E[uiu(cid:62)i ] with ui being the random vector, (cid:107)ui(cid:107)2 = 1, and q\nis the number of random vectors as in Eq. (5).\n\nq )\u2207f (x)(cid:62)C2\u2207f (x) + 1\n\nq\u2207f (x)(cid:62)C\u2207f (x)\n\n(1 \u2212 1\n\n,\n\n(cid:0)\u2207f (x)(cid:62)C\u2207f (x)(cid:1)2\n\nGiven the de\ufb01nition of C, it needs to satisfy two constraints: (1) it should be positive semi-de\ufb01nite;\n(2) its trace should be 1 since Tr(C) = E[Tr(uiu(cid:62)i )] = E[u(cid:62)i ui] = 1. It is noted from Theorem 1\nthat we can minimize L(\u02c6g) by optimizing C, i.e., we can achieve an optimal gradient estimator by\ncarefully sampling the random vectors ui, yielding an query-ef\ufb01cient adversarial attack.\n\nSpeci\ufb01cally, C can be decomposed as(cid:80)D\nand orthonormal eigenvectors of C, and(cid:80)D\n\ni=1 are the eigenvalues\ni=1 \u03bbi = 1. In our method, since we propose to bias\nui towards v to exploit its prior information, we can specify an eigenvector to be v, and let the\ncorresponding eigenvalue be a tunable coef\ufb01cient. For the other eigenvalues, we set them to be equal\nsince we do not have any prior knowledge about the other eigenvectors. In this case, we let\n\ni=1 \u03bbiviv(cid:62)i , where {\u03bbi}D\n\ni=1 and {vi}D\n\nC = \u03bbvv(cid:62) +\n\n(I \u2212 vv(cid:62)),\n\n(10)\n\nwhere \u03bb \u2208 [0, 1] controls the strength of the transfer gradient that the random vectors {ui}q\ni=1 are\nbiased towards. We can easily construct a random vector with unit length while satisfying Eq. (10)\n(proof in Appendix A.2) as\n\n1 \u2212 \u03bb \u00b7 (I \u2212 vv(cid:62))\u03bei,\n1 We will use e to denote the (cid:96)2 normalization of a vector e in this paper.\n\n\u03bb \u00b7 v +\n\nui =\n\n\u221a\n\n\u221a\n\n(11)\n\n1 \u2212 \u03bb\nD \u2212 1\n\n4\n\n\fnumber of queries q; input dimension D.\n\nAlgorithm 1 Prior-guided random gradient-free (P-RGF) method\nInput: The black-box model f; input x and label y; the normalized transfer gradient v; sampling variance \u03c3;\nOutput: Estimate of the gradient \u2207f (x).\n1: Estimate the cosine similarity \u03b1 = v(cid:62)\u2207f (x) (detailed in Sec. 3.3);\n2: Calculate \u03bb\u2217 according to Eq. (12) given \u03b1, q, and D;\n3: if \u03bb\u2217 = 1 then\nreturn v;\n4:\n5: end if\n6: \u02c6g \u2190 0;\n7: for i = 1 to q do\n8:\n9:\n10:\n11: end for\n12: return \u2207f (x) \u2190 1\nq\n\nSample \u03bei from the uniform distribution on the D-dimensional unit hypersphere;\nui =\n\u02c6g \u2190 \u02c6g +\n\n1 \u2212 \u03bb\u2217 \u00b7 (I \u2212 vv(cid:62))\u03bei;\n\u00b7 ui;\n\nf (x + \u03c3ui, y) \u2212 f (x, y)\n\n\u03bb\u2217 \u00b7 v +\n\n\u221a\n\n\u221a\n\n\u02c6g.\n\n\u03c3\n\nwhere \u03bei is sampled uniformly from the unit hypersphere. Hereby, the problem turns to optimizing \u03bb\nthat minimizes L(\u02c6g). The previous work [35] can also be categorized as a special case of our method\nD I, such that the random vectors are drawn from the uniform distribution\nwhen \u03bb = 1\non the hypersphere. When \u03bb \u2208 [0, 1\nD ), it indicates that the transfer gradient is worse than a random\nvector, so we are encouraged to search in other directions by using a small \u03bb. To \ufb01nd the optimal \u03bb,\nwe plug Eq. (10) into Eq. (9), and obtain the closed-form solution (proof in Appendix A.3) as\n\nD and C = 1\n\n0\n\n(1 \u2212 \u03b12)(\u03b12(D + 2q \u2212 2) \u2212 1)\n2\u03b12Dq \u2212 \u03b14D(D + 2q \u2212 2) \u2212 1\n\n1\n\nif \u03b12 \u2264\n\n1\n\nD + 2q \u2212 2\n1\n\n< \u03b12 <\n\nif\nD + 2q \u2212 2\nif \u03b12 \u2265 2q \u2212 1\nD + 2q \u2212 2\n\n2q \u2212 1\n\nD + 2q \u2212 2\n\n.\n\n(12)\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\n\u03bb\u2217 =\n\nRemark. It can be proven (in Appendix A.4) that \u03bb\u2217 is a monotonically increasing function of \u03b12,\nand a monotonically decreasing function of q (when \u03b12 > 1\nD ). It means that a larger \u03b1 or a smaller\nq (when the transfer gradient is not worse than a random vector) would result in a larger \u03bb\u2217, which\nmakes sense since we tend to rely on the transfer gradient more when (1) it approximates the true\ngradient better; (2) the number of queries is not enough to provide much gradient information.\nWe summarize the P-RGF method in Algorithm 1. Note that when \u03bb\u2217 = 1, we directly return the\ntransfer gradient as the estimate of \u2207f (x), which can save many queries.\n\n3.3 Estimation of cosine similarity\nTo complete our algorithm, we also need to estimate \u03b1 = v(cid:62)\u2207f (x) = v(cid:62)\n, where v is the\nnormalized transfer gradient. Note that the inner product v(cid:62)\u2207f (x) can be easily estimated by the\n\ufb01nite difference method\n\n\u2207f (x)\n(cid:107)\u2207f (x)(cid:107)2\n\nv(cid:62)\u2207f (x) \u2248 f (x + \u03c3v, y) \u2212 f (x, y)\n\n\u03c3\n\n,\n\n(13)\n\nusing a small \u03c3. Hence, the problem is reduced to estimating (cid:107)\u2207f (x)(cid:107)2.\nSuppose that it is allowed to conduct S queries to estimate (cid:107)\u2207f (x)(cid:107)2. We \ufb01rst draw a different set of\nS random vectors {ws}S\ns=1 independently and uniformly from the D-dimensional unit hypersphere,\nand then estimate w(cid:62)s \u2207f (x) using Eq. (13). Suppose that we have a r-degree homogeneous function\ng of S variables, i.e., g(az) = arg(z) where a \u2208 R and z \u2208 RS, then we have\n\ng(cid:0)W(cid:62)\u2207f (x)(cid:1) = (cid:107)\u2207f (x)(cid:107)r\n\n2 \u00b7 g(cid:0)W(cid:62)\u2207f (x)(cid:1),\n\n(14)\nwhere W is the collection of the random vectors as W = [w1, ..., wS]. In this case, the norm of\n\nthe gradient (cid:107)\u2207f (x)(cid:107)2 could be computed easily if both g(cid:0)W(cid:62)\u2207f (x)(cid:1) and g(cid:0)W(cid:62)\u2207f (x)(cid:1) are\navailable. Note that g(cid:0)W(cid:62)\u2207f (x)(cid:1) can be calculated since each w(cid:62)s \u2207f (x) is available.\n\n5\n\n\fHowever, it is non-trivial to obtain the value of w(cid:62)s \u2207f (x) as well as the function value g(cid:0)W(cid:62)\u2207f (x)(cid:1).\n\u2207f (x), thus we can compute the expectation of the function value E(cid:2)g(cid:0)W(cid:62)\u2207f (x)(cid:1)(cid:3). Based on\n\nNevertheless, we note that the distribution of w(cid:62)s \u2207f (x) is the same regardless of the direction of\nthat, we use g(W(cid:62)\n\u2207f (x))\nE[g(W(cid:62)\u2207f (x))]\ng(z) = 1\ns=1 z2\nS\n\n2. In particular, we choose g as\n\ns. Then r = 2, and we have\n\nas an unbiased estimator of (cid:107)\u2207f (x)(cid:107)r\n\n(cid:80)S\nE(cid:2)g(cid:0)W(cid:62)\u2207f (x)(cid:1)(cid:3) = E(cid:2)(w(cid:62)1 \u2207f (x))2] = \u2207f (x)(cid:62)E[w1w(cid:62)1 ]\u2207f (x) =\n(cid:16) f (x + \u03c3ws, y) \u2212 f (x, y)\n\nBy plugging Eq. (15) into Eq. (14), we can estimate the gradient norm by\n\nS(cid:88)\n\nS(cid:88)\n\n1\nD\n\n.\n\n(cid:107)\u2207f (x)(cid:107)2 \u2248\n\n(w(cid:62)s \u2207f (x))2 \u2248\n\n(cid:118)(cid:117)(cid:117)(cid:116) D\n\nS\n\n(15)\n\n(16)\n\n(cid:17)2\n\n.\n\n(cid:118)(cid:117)(cid:117)(cid:116) D\n\nS\n\ns=1\n\ns=1\n\n\u03c3\n\nTo save queries, we estimate the gradient norm periodically instead of in every iteration, since usually\nit does not change very fast in the optimization process.\n\n3.4\n\nIncorporating the data-dependent prior\n\nThe proposed P-RGF method is generally \ufb02exible to integrate other priors. As a concrete example, we\nincorporate the commonly used data-dependent prior [19] along with the transfer-based prior into our\nalgorithm. The data-dependent prior is proposed to reduce query complexity, which suggests that we\ncan utilize the structure of the inputs to reduce the input-space dimension without sacri\ufb01cing much\nestimation accuracy. This idea has also been adopted in several works [6, 35, 14, 4]. We observe that\nmany works restrict the adversarial perturbations to lie in a linear subspace of the input space, which\nallows the application of our theoretical framework.\nConsider the RGF estimator in Eq. (5). To leverage the data-dependent prior, suppose ui = V\u03bei,\nwhere V = [v1, v2, ..., vd] is a D \u00d7 d matrix (d < D), {vj}d\nj=1 is an orthonormal basis in the\nd-dimensional subspace of the input space, and \u03bei is a random vector sampled from the d-dimensional\nunit hypersphere. When \u03bei is sampled from the uniform distribution, C = 1\nd\nSpeci\ufb01cally, we focus on the data-dependent prior in [19]. In this method, the random vector \u03bei drawn\nin Rd is up-sampled to ui in RD by the nearest neighbor algorithm, where d < D. The orthonormal\nbasis {vj}d\nj=1 can be obtained by \ufb01rst up-sampling the standard basis in Rd with the same method\nand then applying normalization.\nNow we consider incorporating the data-dependent prior into our algorithm. Similar to Eq. (10),\nwe let one eigenvector of C be v to exploit the transfer-based prior, and the others are given by the\northonormal basis in the subspace to exploit the data-dependent prior, as\n\n(cid:80)d\n\ni=1 viv(cid:62)i .\n\nC = \u03bbvv(cid:62) +\n\n1 \u2212 \u03bb\nd\n\nd(cid:88)\n\ni=1\n\nviv(cid:62)i .\n\n(17)\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\n\u03bb\u2217 =\n\nwhere A2 =(cid:80)d\n\nBy plugging Eq. (17) into Eq. (9), we can also obtain the optimal \u03bb (proof in Appendix A.5) as\n\n0\n\nA2(A2 \u2212 \u03b12(d + 2q \u2212 2))\n\nA4 + \u03b14d2 \u2212 2A2\u03b12(q + dq \u2212 1)\n\n1\n\nA2\n\nif \u03b12 \u2264\nd + 2q \u2212 2\nA2\n\nif\nd + 2q \u2212 2\nif \u03b12 \u2265 A2(2q \u2212 1)\n\n< \u03b12 <\n\nd\n\nA2(2q \u2212 1)\n\nd\n\n,\n\n(18)\n\ni=1(v(cid:62)i \u2207f (x))2. Note that A should also be estimated. We use a similar method to\nthe one for estimating \u03b1 in Sec. 3.3, which is provided in Appendix C.\nThe remaining problem is to construct a random vector ui satisfying E[uiu(cid:62)i ] = C, with C de\ufb01ned\nin Eq. (17). In general, it is dif\ufb01cult since v is not orthogonal to the subspace. To address this issue,\nwe sample ui in a way that E[uiu(cid:62)i ] is a good approximation of C (explanation in Appendix A.6),\nwhich is similar to Eq. (11) as\n\n\u221a\n\n\u03bb \u00b7 v +\n\n\u221a\n\nui =\n\n1 \u2212 \u03bb \u00b7 (I \u2212 vv(cid:62))V\u03bei,\n\n(19)\n\n6\n\n\fFigure 1: (a) The crafted adversarial examples for the Inception-v3 [34] model by RGF and our P-RGF w.r.t.\nnumber of queries. We show the cross-entropy loss of each image. The images in the green boxes are successfully\nmisclassi\ufb01ed, while those in the red boxes are not. (b) The estimation error of gradient norm with different S.\n(c) The average cosine similarity between the estimated gradient and the true gradient. The estimate is given\nby our method with \ufb01xed \u03bb and optimal \u03bb, respectively. (d) The average \u03bb\u2217 across attack iterations. (e) The\naverage cosine similarity between the transfer and the true gradients, and that between the estimated and the true\ngradients, across attack iterations.\n\nwhere \u03bei is sampled uniformly from the d-dimensional unit hypersphere.\nOur algorithm with the data-dependent prior is similar to Algorithm 1. We \ufb01rst estimate \u03b1 and A,\nand then calculate \u03bb\u2217 by Eq. (18). If \u03bb\u2217 = 1, we use the transfer gradient v as the estimate. If not,\nwe sample q random vectors by Eq. (19) and obtain the gradient estimation by Eq. (5).\n\n4 Experiments\n\nIn this section, we present the experimental results to demonstrate the effectiveness of the proposed\nmethod on attacking black-box classi\ufb01ers.2 We perform untargeted attacks under both the (cid:96)2 and (cid:96)\u221e\nnorms on the ImageNet dataset [31]. We choose 1,000 images randomly from the validation set for\nevaluation. Due to the space limitation, we leave the results based on the (cid:96)\u221e norm in Appendix D.\nThe results for both norms are consistent. For all experiments, we use the ResNet-152 model [17] as\nthe surrogate model to generate the transfer gradient. We apply the PGD algorithm [24] to generate\nadversarial examples with the estimated gradient given by each method. We set the perturbation size\n0.001 \u00b7 D and the learning rate as \u03b7 = 2 in PGD under the (cid:96)2 norm, with images in [0, 1].\nas \u0001 =\n\n\u221a\n\n4.1 Performance of gradient estimation\n\nIn this section, we conduct several experiments to show the performance of gradient estimation. All\nexperiments in this section are performed on the Inception-v3 [34] model.\nFirst, we show the performance of gradient norm estimation in Sec. 3.3. The gradient norm (or cosine\nsimilarity) is easier to estimate than the true gradient since it\u2019s a scalar value. Fig. 1(b) shows the esti-\n\nmation error of the gradient norm, de\ufb01ned as the (normalized) RMSE\u2014\nw.r.t. the number of queries S, where (cid:107)\u2207f (x)(cid:107)2 is the true norm and (cid:92)(cid:107)\u2207f (x)(cid:107)2 is the estimated one.\nWe choose S = 10 in all experiments to reduce the number of queries while the estimation error is\n\n(cid:107)\u2207f (x)(cid:107)2\u2212(cid:107)\u2207f (x)(cid:107)2\n\n(cid:114)\nE(cid:0) (cid:92)\n\n(cid:107)\u2207f (x)(cid:107)2\n\n(cid:1)2,\n\n2Our code is available at: https://github.com/thu-ml/Prior-Guided-RGF.\n\n7\n\nLoss=0.01Loss=0.02Loss=0.09Loss=0.01Loss=0.04Loss=2.28Loss=2.380100RGFP-RGF50010002000QueriesLoss=0.32Loss=7.08Loss=14.92(a)(b)(c)(d)(e)\fTable 1: The experimental results of black-box attacks against Inception-v3, VGG-16, and ResNet-50 under\nthe (cid:96)2 norm. We report the attack success rate (ASR) and the average number of queries (AVG. Q) needed to\ngenerate an adversarial example over successful attacks.\n\nInception-v3\n\nAVG. Q\n\nVGG-16\n\nAVG. Q\n\nResNet-50\n\nAVG. Q\n\nMethods\nNES [18]\nBanditsT [19]\nBanditsTD [19]\nAutoZoom [35]\nRGF\nP-RGF (\u03bb = 0.5)\nP-RGF (\u03bb = 0.05)\nP-RGF (\u03bb\u2217)\nRGFD\nP-RGFD (\u03bb = 0.5)\nP-RGFD (\u03bb = 0.05)\nP-RGFD (\u03bb\u2217)\n\nASR\n95.5%\n92.4%\n97.2%\n85.4%\n97.7%\n96.5%\n97.8%\n98.1%\n99.1%\n98.2%\n99.1%\n99.1%\n\n1718\n1560\n874\n2443\n1309\n1119\n1021\n745\n910\n1047\n754\n649\n\nASR\n98.7%\n94.0%\n94.9%\n96.2%\n99.8%\n97.3%\n99.7%\n99.8%\n100.0%\n99.3%\n99.9%\n99.7%\n\n1081\n584\n278\n1589\n935\n1075\n888\n521\n464\n917\n482\n370\n\nASR\n98.4%\n96.2%\n96.8%\n94.8%\n99.5%\n98.3%\n99.6%\n99.6%\n99.8%\n99.3%\n99.6%\n99.6%\n\n969\n1076\n512\n2065\n809\n990\n790\n452\n521\n893\n526\n352\n\nFigure 2: We show the average number of queries per successful image at any desired success rate.\n\nacceptable. We also estimate the gradient norm every 10 attack iterations in all experiments to reduce\nthe number of queries, since usually its value is relatively stable in the optimization process.\nSecond, we verify the effectiveness of the derived optimal \u03bb (i.e., \u03bb\u2217) in Eq. (12) for gradient\nestimation, compared with any \ufb01xed \u03bb \u2208 [0, 1]. We perform attacks against Inception-v3 using P-\nRGF with \u03bb\u2217, and calculate the cosine similarity between the estimated gradient and the true gradient.\nWe calculate \u03bb\u2217 using the estimated \u03b1 in Sec. 3.3 instead of its true value. Meanwhile, along the\nPGD updates, we also use \ufb01xed \u03bb to get gradient estimates, and calculate the corresponding cosine\nsimilarities. Note that \u03bb\u2217 does not correspond to any \ufb01xed value, since it varies during iterations.\nThe average cosine similarities of different values of \u03bb are shown in Fig. 1(c). It can be observed\nthat when a suitable value of \u03bb is chosen, the proposed P-RGF provides a better gradient estimate\nD \u2248 0) and the transfer\nthan both the original RGF method with uniform distribution (when \u03bb = 1\ngradient (when \u03bb = 1). Adopting \u03bb\u2217 brings further improvement upon any \ufb01xed \u03bb, demonstrating\nthe applicability of our theoretical framework.\nFinally, we show the average \u03bb\u2217 over all images w.r.t. attack iterations in Fig 1(d). It shows that \u03bb\u2217\ndecreases along with the iterations. Fig. 1(e) shows the average cosine similarity between the transfer\nand the true gradients, and that between the estimated and the true gradients, across iterations. The\nresults show that the transfer gradient is useful at beginning, and becomes less useful along with the\niterations. However, the estimated gradient can remain higher cosine similarity with the true gradient,\nwhich facilitates the adversarial attacks consequently. The results also prove that we need to use the\nadaptive \u03bb\u2217 in different attack iterations.\n\n4.2 Results of black-box attacks on normal models\n\nIn this section, we perform attacks against three normally trained models, which are Inception-v3 [34],\nVGG-16 [32], and ResNet-50 [16]. We compare the proposed prior-guided random gradient-free\n(P-RGF) method with two baseline methods. The \ufb01rst is the original RGF method with uniform\nsampling. The second is the P-RGF method with a \ufb01xed \u03bb, which is set to 0.5 or 0.05. In these\n\n8\n\n\u21e4\u21e4\u21e4(a) Inception-v3(b) VGG-16(c) ResNet-50\fTable 2: The experimental results of black-box attacks against JPEG compression [15], randomization [37], and\nguided denoiser [22] under the (cid:96)2 norm. We report the attack success rate (ASR) and the average number of\nqueries (AVG. Q) needed to generate an adversarial example over successful attacks.\n\nMethods\nNES [18]\nSPSA [36]\nRGF\nP-RGF\nRGFD\nP-RGFD\n\nAVG. Q\n\nJPEG Compression [15]\nASR\n47.3%\n40.0%\n41.5%\n61.4%\n70.4%\n81.1%\n\n3114\n2744\n3126\n2419\n2828\n2120\n\nAVG. Q\n\nAVG. Q\n\nRandomization [37] Guided Denoiser [22]\nASR\n23.2%\n9.6%\n19.5%\n60.4%\n54.9%\n82.3%\n\nASR\n48.0%\n46.0%\n50.3%\n51.4%\n83.7%\n89.6%\n\n3632\n3256\n3259\n2153\n2819\n1816\n\n3633\n3526\n3569\n2858\n2230\n1784\n\nmethods, we set the number of queries as q = 50, and the sampling variance as \u03c3 = 0.0001\u00b7\u221a\nD. We\nalso incorporate the data-dependent prior into these three methods for comparison (which are denoted\nby adding a subscript \u201cD\u201d). We set the dimension of the subspace as d = 50 \u00d7 50 \u00d7 3. Besides, our\nmethod is compared with the state-of-the-art attack methods, including the natural evolution strategy\n(NES) [18], bandit optimization methods (BanditsT and BanditsTD) [19], and AutoZoom [35]. For\nall methods, we restrict the maximum number of queries for each image to be 10,000. We report a\nsuccessful attack if a method can generate an adversarial example within 10,000 queries and the size\nof perturbation is smaller than the budget (i.e., \u0001 =\nWe show the success rate of black-box attacks and the average number of queries needed to generate\nan adversarial example over the successful attacks in Table 1. It can be seen that our method generally\nleads to higher attack success rates and requires much fewer queries than other methods. Using a\n\ufb01xed \u03bb cannot give a satisfactory result, which demonstrates the necessity of using the optimal \u03bb\nin our method. The results also show that the data-dependent prior is orthogonal to the proposed\ntransfer-based prior, since integrating the data-dependent prior leads to better results. We show an\nexample of attacks in Fig. 1(a). Fig. 2 shows the average number of queries over successful images\nby reaching a desired success rate. Our method is much more query-ef\ufb01cient than baseline methods.\n\n0.001 \u00b7 D).\n\n\u221a\n\n4.3 Results of black-box attacks on defensive models\n\nWe further validate the effectiveness of the proposed method on attacking several defensive models,\nincluding JPEG compression [15], randomization [37], and guided denoiser [22]. We utilize the\nInception-v3 model as the backbone classi\ufb01er for the JPEG compression and randomization defenses.\nWe compare P-RGF with RGF, NES [18], and SPSA [36]. The experimental settings are the same\nwith those of attacking the normal models in Sec. 4.2. In our method, we use a smoothed version of\nthe transfer gradient [8] as the transfer-based prior for black-box attacks, since the smoothed transfer\ngradient is better to defeat defensive models. The results in Table 2 also demonstrate the superiority\nof our method for attacking the defensive models. Our method leads to much higher attack success\nrates than other methods (20% \u223c 40% improvements in many cases), and also reduces the query\ncomplexity.\n\n5 Conclusion\n\nIn this paper, we proposed a prior-guided random gradient-free (P-RGF) method to utilize the transfer-\nbased prior for improving black-box adversarial attacks. Our method appropriately integrated the\ntransfer gradient of a surrogate white-box model by the derived optimal coef\ufb01cient. The experimental\nresults consistently demonstrate the effectiveness of our method, which requires much fewer queries\nto attack black-box models with higher success rates.\n\nAcknowledgements\n\nThis work was supported by the National Key Research and Development Program of China (No.\n2017YFA0700904), NSFC Projects (Nos. 61620106010, 61621136008, 61571261), Beijing NSF\nProject (No. L172037), Beijing Academy of Arti\ufb01cial Intelligence (BAAI), Tiangong Institute for\nIntelligent Computing, the JP Morgan Faculty Research Program and the NVIDIA NVAIL Program\nwith GPU/DGX Acceleration.\n\n9\n\n\fReferences\n[1] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security:\nCircumventing defenses to adversarial examples. In International Conference on Machine Learning\n(ICML), 2018.\n\n[2] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Pavel Laskov, Giorgio Giacinto, and Fabio\nRoli. Evasion attacks against machine learning at test time. In Joint European Conference on Machine\nLearning and Knowledge Discovery in Databases, pages 387\u2013402, 2013.\n\n[3] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks\nagainst black-box machine learning models. In International Conference on Learning Representations\n(ICLR), 2018.\n\n[4] Thomas Brunner, Frederik Diehl, Michael Truong Le, and Alois Knoll. Guessing smart: Biased sampling\nfor ef\ufb01cient black-box adversarial attacks. In The IEEE International Conference on Computer Vision\n(ICCV), 2019.\n\n[5] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE\n\nSymposium on Security and Privacy, 2017.\n\n[6] Pin Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho Jui Hsieh. Zoo: Zeroth order optimization\nbased black-box attacks to deep neural networks without training substitute models. In ACM Workshop on\nArti\ufb01cial Intelligence and Security (AISec), pages 15\u201326, 2017.\n\n[7] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting\nadversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and\nPattern Recognition (CVPR), 2018.\n\n[8] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples\nby translation-invariant attacks. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR), 2019.\n\n[9] Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu, Tong Zhang, and Jun Zhu. Ef\ufb01cient decision-\nbased black-box adversarial attacks on face recognition. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition (CVPR), 2019.\n\n[10] John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order\nconvex optimization: The power of two function evaluations. IEEE Transactions on Information Theory,\n61(5):2788\u20132806, 2015.\n\n[11] Saeed Ghadimi and Guanghui Lan. Stochastic \ufb01rst-and zeroth-order methods for nonconvex stochastic\n\nprogramming. SIAM Journal on Optimization, 23(4):2341\u20132368, 2013.\n\n[12] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http:\n\n//www.deeplearningbook.org.\n\n[13] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.\n\nIn International Conference on Learning Representations (ICLR), 2015.\n\n[14] Chuan Guo, Jared S Frank, and Kilian Q Weinberger. Low frequency adversarial perturbation. arXiv\n\npreprint arXiv:1809.08758, 2018.\n\n[15] Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. Countering adversarial images\n\nusing input transformations. In International Conference on Learning Representations (ICLR), 2018.\n\n[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.\n\nIn Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.\n\n[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks.\n\nIn European Conference on Computer Vision (ECCV), 2016.\n\n[18] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited\n\nqueries and information. In International Conference on Machine Learning (ICML), 2018.\n\n[19] Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversarial attacks\n\nwith bandits and priors. In International Conference on Learning Representations (ICLR), 2019.\n\n[20] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. The\n\nInternational Conference on Learning Representations (ICLR) Workshops, 2017.\n\n10\n\n\f[21] Peter D Lax and Maria Shea Terrell. Calculus with applications. Springer, 2014.\n\n[22] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against\nadversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference\non Computer Vision and Pattern Recognition (CVPR), 2018.\n\n[23] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and\n\nblack-box attacks. In International Conference on Learning Representations (ICLR), 2017.\n\n[24] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. To-\nwards deep learning models resistant to adversarial attacks. In International Conference on Learning\nRepresentations (ICLR), 2018.\n\n[25] Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, and Jascha Sohl-Dickstein. Guided\nevolutionary strategies: Augmenting random search with surrogate gradients. In International Conference\non Machine Learning (ICML), 2019.\n\n[26] Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Founda-\n\ntions of Computational Mathematics, 17(2):527\u2013566, 2017.\n\n[27] Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Practical black-box attacks on deep neural\nnetworks using ef\ufb01cient query mechanisms. In European Conference on Computer Vision (ECCV), 2018.\n\n[28] Seong Joon Oh, Max Augustin, Bernt Schiele, and Mario Fritz. Towards reverse-engineering black-box\n\nneural networks. In International Conference on Learning Representations (ICLR), 2018.\n\n[29] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\n\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.\n\n[30] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami.\nPractical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference\non Computer and Communications Security, 2017.\n\n[31] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,\nAndrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge.\nInternational Journal of Computer Vision, 115(3):211\u2013252, 2015.\n\n[32] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni-\n\ntion. In International Conference on Learning Representations (ICLR), 2015.\n\n[33] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,\nand Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning\nRepresentations (ICLR), 2014.\n\n[34] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the\ninception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision\nand Pattern Recognition (CVPR), 2016.\n\n[35] Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, and Shin-\nMing Cheng. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box\nneural networks. In Proceedings of the Thirty-Third AAAI Conference on Arti\ufb01cial Intelligence (AAAI),\n2019.\n\n[36] Jonathan Uesato, Brendan O\u2019Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and\nthe dangers of evaluating against weak attacks. In International Conference on Machine Learning (ICML),\n2018.\n\n[37] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects\n\nthrough randomization. In International Conference on Learning Representations (ICLR), 2018.\n\n11\n\n\f", "award": [], "sourceid": 5860, "authors": [{"given_name": "Shuyu", "family_name": "Cheng", "institution": "Tsinghua University"}, {"given_name": "Yinpeng", "family_name": "Dong", "institution": "Tsinghua University"}, {"given_name": "Tianyu", "family_name": "Pang", "institution": "Tsinghua University"}, {"given_name": "Hang", "family_name": "Su", "institution": "Tsinghua Univiersity"}, {"given_name": "Jun", "family_name": "Zhu", "institution": "Tsinghua University"}]}