NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center

### Reviewer 1

This paper proposes a new, more efficient method for performing adversarial training. The performance of the proposed training protocol is comparable to state-of-the-art results in adversarial training, while being efficient enough to adversarially train a model on ImageNet on a workstation. Experimental results are presented on CIFAR-10, CIFAR-100 and ImageNet. Originality: The idea of using the backward pass necessary for training to also compute adversarial samples seems indeed novel. Projected gradient descent (PGD) adversarial samples require multiple backward passes. In order to obtain strong adversarial samples for training, the same minibatch is used for training the model consecutively and to produce the PGD iterations each time on the updated gradient. The total number of epochs used for training is divided by the number of iterations on the same minibatch to ensure an equivalent number of training iterations as with natural training. Thus, the computation time for the proposed protocol is in the end comparable with that of natural training. Quality: The idea of "warm starting" each new minibatch with the perturbation values from the previous minibatch is not particularly founded, and no justification or ablation study is provided to analyze the impact of this choice. What happens when no warm start is used? How much does the final attack differ from the initialization value? Is this pushing the attack towards some notion of universal perturbation? The paper puts a strong accent on the fact that the proposed protocol is designed for untargeted adversarial training. It would be good to see a comparison with previous (targeted) results on ImageNet from [Xie et al., 2019] and [Kannan et al., 2018]. Some aspects of the experimental section are not fully convincing, as the attacks used for evaluation are arguably not too strong. The attacks against ImageNet (Table 3) seem to use $\epsilon=2$ (over 255?), which is too small a value to reflect the robustness of the model. Moreover, I was not able to find the exact parameters used when testing against the C&W attack (Table 1). Moreover, this attack was only evaluated on CIFAR-10. In most cases, evaluation against the PGD attack does not seem to use random restarts (except for one configuration int Table 1). This feature is known to make the attack considerably stronger. The paper mentions the SPSA black-box attack in the experimental section, but then fails to compare against it, claiming that it would not perform great anyway. The number of repetitions of the same minibatch $m$ seems to have a strong impact on both clean and adversarial accuracies (trade-off). How would one tune it efficiently in practice? Clarity: The paper is overall well written. Using both K-PGD and PGD-K notations can be a source of confusion. Significance: Provided that the method proposed in the paper is sound and obtains the claimed performance, it would offer a more efficient alternative to train a robust model. Minor remarks: - Lines 74-75: The main difference between BIM and PGD is actually the projection step performed by PGD (and giving the name of the method) but not by BIM. - Line 164: Possibly incorrect reference to Section 4. - Line 193: Extra word "reference". - Alg. 1, line 8: Extra square bracket. - Alg. 1, line 12: This should probably be a projection operation, not clipping, in order to generalize beyong $L_{\inf}$. [UPDATE] I would like to thank the authors for their detailed explanations and additional experiments. These have provided some additional clarity and should be included in the paper. In view of the rebuttal, some concerns still remain. I believe that testing the proposed adversarial training strategy against stronger attacks (e.g., using high confidence in C&W attack, larger eps budget for the others) would prove the robustness of the obtained model beyond a doubt. I am however increasing my rating from 5 to 7 in view of the rebuttal.

### Reviewer 2

Originality: The paper has mainly one original idea - using the backward pass of backprop algorithm to also compute the adversarial example. On one hand, it is really impactful because the authors show empirically that it speeds up the training process while maintaining equal robustness to adversarial attacks, but on the other hand the idea itself isn't really outstanding. Quality: The paper gives experimental verification of the idea, and claim to achieve the state of the art robustness on CIFAR datasets. The paper also gives detailed results of the experiment like the training time taken, and show that it is indeed close the time taken for natural training. They also have a section explaining how the loss surface for their technique is flat and smooth; the adversarial examples for their technique look like the actual target class. These properties are also seen in standard adversarial training. Thus their technique is similar to the standard adversarial training even in these aspects. Therefore, quality of the paper is good. Clarity: The paper is well written. Significance: The significance would be really high because training robust models would be almost as fast as training non-robust models. This would greatly benefit the robust machine learning research. Having said that, other than this one idea, there aren't any other ideas or contributions of the paper.