Attention-based Interpretation and Response to The Trade-Off of Adversarial TrainingDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Adversarial training, Trade-off
Abstract: To boost the robustness of a model against adversarial examples, adversarial training has been regarded as a benchmark method. However, it is commonly considered to be easily suffering from the trade-off dilemma between robustness and generalization in practice. This paper tries to make an intuitive explanation for this phenomenon in the perspective of model attention and provides an attention expansion viewpoint to learn a reliable model. To be specific, we argue that adversarial training does enable one model to concentrate on exact semantic information of input, which is beneficial to avoid adversarial accumulation. But it also easily make the model to cover fewer spatial region so that the model usually ignores some inherent features of the input. This may be one main reason to result in weak generalization on unseen inputs. To address this issue, we propose an Attention-Extended Learning Framework (AELF) built on the cascade structure of deep models. AELF advocates that clean high-level features (from natural inputs) are used to guide the robustness learning rather than hand-crafted labels, so as to ensure broad spatial attention of model to input space. In addition, we provide a very simple solution to implement AELF under the efficient softmax-based training manner, which avoids checking the difference between high-dimensional embedding vectors via additional regularization loss. Experimental observations verify the rationality of our interpretation, and remarkable improvements on multiple datasets also demonstrate the superiority of AELF.
One-sentence Summary: To address the trade-off issue of adversarial training, this paper gives an straightforward interpretation to the trade-off and provide a simple soution.
4 Replies

Loading