Less is More: Feature Selection for Adversarial Robustness with Compressive Counter-Adversarial Attacks
Keywords: Adversarial learning, compression, counter-attack, activation supression
TL;DR: We investigate consistency measures on latent features representation and propose to use counter adversarial attacks to improve the robustness against adversarial attacks in image classification.
Abstract: A common observation regarding adversarial attacks is that they mostly give rise to false activation at the penultimate layer to fool the classifier. Assuming that these activation values correspond to certain features of the input, the objective becomes choosing the features that are most useful for classification. Hence, we propose a novel approach to identify the important features by employing counter-adversarial attacks, which highlights the consistency at the penultimate layer with respect to perturbations on input samples. First, we empirically show that there exist a subset of features, classification based in which bridge the gap between the clean and robust accuracy. Second, we propose a simple yet efficient mechanism to identify those features by searching the neighborhood of input sample. We then select features by observing the consistency of the activation values at the penultimate layer.
2 Replies
Loading