Review for NeurIPS paper: Denoised Smoothing: A Provable Defense for Pretrained Classifiers

NeurIPS 2020

Denoised Smoothing: A Provable Defense for Pretrained Classifiers

Review 1

Summary and Contributions: The paper proposes a denoising based adversarial defense method. The custom trained denoiser is applied before passing an image to the target classifier to remove any malicious noise.

Strengths: The problem addressed in the paper is interesting, and the solution to handle such adversarial attacks is impressive.

Weaknesses: The paper has several weaknesses ranging from the algorithm to experimental evaluation. 1. The algorithm proposed looks merely like plug and play from the different existing algorithms such as Cohen et al., Salman et al., Li et al. Therefore, the authors need to define their contribution clearly. Reading the paper looks like the only difference from several existing works is not the training of the target classifier. 2. The claim on line 28-29 that this is the first work is entirely wrong. Several works have been done in the literature that is based on compression, randomization, and mitigation, where the authors have proposed a separate pipeline for adversarial effect removal. Some of the works have also mentioned on page 2, line 53 by the authors. 3. The understanding of the white-box scenario is misleading. The authors must show that the proposed methodology is secure even when the attacker has access to the denoiser. 4. The comparison with Cohen et al. shows that the proposed algorithm is far behind being useful in this crucial direction, or it might become another existing defense. 5. While the authors have claimed that the proposed algorithm can handle the l_p norm attack, the experimental setup is extremely unclear. What all possible existing attacks the algorithm can handle? 6. Are improvements statistically significant?

Correctness: No proper theoritical justification is provided.

Clarity: The paper is tough to follow and read. The authors need to provide significant details in the paper itself. To look at everything in the supplementary or appendix makes the document difficult to read. Please check the formatting as well.

Relation to Prior Work: The related work section is weak and needs significant improvement.

Reproducibility: No

Additional Feedback: The authors need to explain how the denoiser is trained and the parameters that can affect its performance. Post rebuttal: The explanation towards the sensitivity against unseen noise is not provided. Is the denoiser capable in handling unseen noise variation such as trained on gaussian (l2) and tested on laplace. In the rebuttal authors pointed towards training using each category of noise to handle it. I think this can lead to limitation to practial implementation. The contribution against multiple existing works is not properly discussed/justified including experimental gains. The discussion or preferably comparisons with other certified defenses need to be added.

Review 2

Summary and Contributions: The main contribution of this paper is to provide an algorithm to make a pre-tained model, which can be part of an API in a cloud, robust certifiably. The main promise is mentioned to be avoiding the re-training of the original classifier. The general idea is to use the Cohen's randomized smoothing as the basic building block and apply an adaptive denoising module right before feeding the input to the pre-tained classifier. The denoising module could be trained by the MSE loss, or CE loss of the model prediction of gaussian noise corrupted input and prediction of the original input. The latter is called STAB by the authors. The idea is applicable in both white-box and blackbox setting of the base classifier. In the blackbox setting, it is suggested to train a surrogate classifier for the STAB loss. The experimental results showed non-trivial performance (certified l2 radius) compared to the case where no denoising is applied prior to feeding the data to the network.

Strengths: The paper is written fairly well and is clear enough. The idea sounds simple and makes it possible to provide certified robustness of the pre-trained model.

Weaknesses: - By training a classifier on the original data in the white-box setting, the authors break the promise of avoiding any re-training that is given in the paper. - The main aim of this work is to be of practical use in the cloud services. I think certified robustness is generally believed to provide weaker robust accuracies than the Madry's adversarial training. This questions the applicability of this approach. - Limited novelty: The idea of using the input transformation in adversarial robustness is not novel (Lecuyer et al. (2018) as mentioned in the paper uses a similar idea). The only difference is that the authors are using this idea in the context of certified robustness and also making the transformation adaptive to the original classifier.

Correctness: Yes.

Clarity: Yes, the paper is written clearly.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: I believe the authors should motivate their setting a lot better. They emphasize the need to avoid re-training and still do this in the blackbox setting! They also mention lp robustness, in which the certified radius is known to depend intrinsically on the input dimension, except for the l1 and l2 norms (e.g. https://arxiv.org/abs/2002.03517, and https://arxiv.org/abs/2002.08118). This further questions the practicality of this approach. Looking at the Fig. 3 and 4, it seems like the difference between the proposed method and the original Cohen's method, as an upper limit and ideal behavior, increases when the model becomes more complex (i.e. going from ResNet-18 to ResNet-50). Is there any explanation for this? This suggests that the method might not be scalable enough. In addition, the authors should give some insights on why the STAB+MSE loss gives more corrupted output compared to the MSE loss (Fig. 5), while the latter gives worse robustness radius.

Review 3

Summary and Contributions: The paper proposes denoised smoothing which prepends a customized denoiser before any pretrained classifier and then applies randomized smoothing to provably robustify the classifier. Particularly, the denoiser is trained by combining MSE and stability Objectives. Experimental results demonstrate that the proposed strategy can improve the provably robustness of the pretrained classifier under both white-box and black-box settings.

Strengths: - The goal to provably robustify arbitrary pretrained classifier without retraining the underlying weights is very interesting. The authors well motivate the practical usefulness of the new technique and support it with extensive experimental results. - The authors successfully demonstrate the effectiveness of denoised smoothing on public vision APIs like Azure, Google Cloud Vision, AWS, and Clarifai.

Weaknesses: - The proposed denoising method is quite straigthforward and thus the novelty of the technique itself is limited. Specifically, MSE objective is widely used in image denoising. Also, Stability Objective is used in Li et al. (2019) for improving the model robustness against large Gaussian noise, though the context is slightly different there. - There is still a small but clear gap between the new technique and randomized smoothing (Cohen et al. (2019)) even under white-box setting. However, it is not a big concern considering this is the first paper aiming for provably robustifying pretrained classifiers.

Correctness: Correct.

Clarity: Well written.

Relation to Prior Work: Clearly discussed.

Reproducibility: Yes

Additional Feedback: *** Post Rebuttal *** Thanks for the response to settle the questions and concerns from the other reviewers. I will keep my original score.