NeurIPS 2020

On the Trade-off between Adversarial and Backdoor Robustness


Meta Review

This paper is postulating a trade-off between adversarial robustness and resilience to backdoor attacks. The underlying phenomenon is definitely of interest. However, upon closer inspection, it was not clear if the way the authors phrase their findings is the most illuminating/appropriate one. (See the reviews for more details.) Specifically, what the authors seem to be finding (and also mentioning in the paper) is that robust models tend to rely on different features than non-robust models and, because of that, class-consistent data poisoning triggers tend to be picked up by robust models more. It seems that structuring the paper (and provided evidence) around this claim will lead to much less confusion (see the reviews and important points made in them). Overall, this is a paper that would be worth having in NeurIPS should the authors reflect the above point (as well as the ones made in the reviews).