Processing math: 100%

Class-Disentanglement and Applications in Adversarial Detection and Defense

Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

Bibtex Paper Reviews And Public Comment » Supplemental

Authors

Kaiwen Yang, Tianyi Zhou, Yonggang Zhang, Xinmei Tian, Dacheng Tao

Abstract

What is the minimum necessary information required by a neural net D() from an image x to accurately predict its class? Extracting such information in the input space from x can allocate the areas D() mainly attending to and shed novel insights to the detection and defense of adversarial attacks. In this paper, we propose ''class-disentanglement'' that trains a variational autoencoder G() to extract this class-dependent information as xG(x) via a trade-off between reconstructing x by G(x) and classifying x by D(xG(x)), where the former competes with the latter in decomposing x so the latter retains only necessary information for classification in xG(x). We apply it to both clean images and their adversarial images and discover that the perturbations generated by adversarial attacks mainly lie in the class-dependent part xG(x). The decomposition results also provide novel interpretations to classification and attack models. Inspired by these observations, we propose to conduct adversarial detection and adversarial defense respectively on xG(x) and G(x), which consistently outperform the results on the original x. In experiments, this simple approach substantially improves the detection and defense against different types of adversarial attacks.