- Keywords: quantization, adversarial example, robustness, convolutional neural network, concept
- TL;DR: We propose a quantization-based method which regularizes a CNN's learned representations to be automatically aligned with trainable concept matrix hence effectively filtering out adversarial perturbations.
- Abstract: Recent research has intensively revealed the vulnerability of deep neural networks, especially for convolutional neural networks (CNNs) on the task of image recognition, through creating adversarial samples which `"slightly" differ from legitimate samples. This vulnerability indicates that these powerful models are sensitive to specific perturbations and cannot filter out these adversarial perturbations. In this work, we propose a quantization-based method which enables a CNN to filter out adversarial perturbations effectively. Notably, different from prior work on input quantization, we apply the quantization in the intermediate layers of a CNN. Our approach is naturally aligned with the clustering of the coarse-grained semantic information learned by a CNN. Furthermore, to compensate for the loss of information which is inevitably caused by the quantization, we propose the multi-head quantization, where we project data points to different sub-spaces and perform quantization within each sub-space. We enclose our design in a quantization layer named as the Q-Layer. The results obtained on MNIST and Fashion-MNSIT datasets demonstrate that only adding one Q-Layer into a CNN could significantly improve its robustness against both white-box and black-box attacks.
- Code: https://github.com/AnonymousWorld123/QLayer