Turn Up the Heat: Training with High Temperatures Boosts Robustness Against Unseen Adversarial Attacks

Anonymous

Turn Up the Heat: Training with High Temperatures Boosts Robustness Against Unseen Adversarial Attacks

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: Training with higher temperatures can boost adversarial robustness

Abstract: Deep learning models have achieved remarkable performance across various domains, but are vulnerable to adversarial attacks. Existing defences such as adversarial training face challenges when applied to NLP models due to the computational complexity, while others are form-specific. A prevalent practical strategy is augmentation-based adversarial training, where adversarial examples are included in the training set. While successful, this approach largely only improves robustness against the specific attack forms the model is trained on and its training time scales linearly with the augmentation factor. We propose a simple modification to the standard training algorithm which boosts absolute accuracy in the presence of adversarial examples up to 14 accuracy points, without increasing model training time. Our modification is the use of a high temperature parameter during training to scale down predicted logits from classification systems. We finally show that this high temperature training approach complements existing adversarial training techniques, further improving the adversarial robustness of augmentation-based, adversarially trained NLP systems against unseen adversarial attacks.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability

Languages Studied: English

0 Replies

Loading