Improving Model Robustness against Adversarial Examples with Redundant Fully Connected Layer

Published: 01 Jan 2024, Last Modified: 30 Sept 2024WWW (Companion Volume) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent studies show that deep neural networks are extremely vulnerable, especially for adversarial examples of image classification models. However, the current defense technologies exhibit a series of limitations in terms of the adaptability of different attacks, the trade-off between clean-instance accuracy and robust one, as well as efficiency for train time overhead. To tackle these problems, we present a novel component, named redundant fully connected layer, which can be combined with existing model backbones in a pluggable manner. Specifically, we design a tailor-made loss function for it that leverages cosine similarity to maximize the difference and diversity of multiple fully connected parts. We conduct extensive experiments against 12 representative attacks (white-box and black-box), based on the popular dataset. The empirical evaluations show that our scheme realizes significant outcomes against various attacks with negligible additional training overhead, while hardly bringing collateral damage for clean-instance accuracy.
Loading