Robustifying Language Models via Adversarial Training with Masked Gradient

Jaehyung Kim; Yuning Mao; Rui Hou; Hanchao Yu; Davis Liang; Pascale Fung; Qifan Wang; Madian Khabsa

Robustifying Language Models via Adversarial Training with Masked Gradient

Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Madian Khabsa

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: NLP, language model, robustness, classification

Abstract: Fine-tuning pre-trained language models (LMs) has become the de-facto standard method for improving state-of-the-art performances on various NLP tasks. Although these models are usually evaluated with accuracy on fixed validation sets, it is insufficient for the reliable deployment of fine-tuned LMs in real-world settings, as there are known issues within existing model evaluations, such as adversarial robustness and model calibration. To address such issues, we propose a simple yet effective training algorithm, coined Robustifying LMs via Adversarial training with Masked gradient (RAM), to improve the robustness of fine-tuned LMs. In particular, we leverage adversarial training to robustify LMs for various types of perturbations. Simultaneously, to prevent the trained model from largely deviating from the initial pre-trained model, we selectively update the important model parameters using the masked gradients; their relative importance is obtained from the gradients calculated during training. Consequently, it enables the model to preserve the generalizability of the pre-trained model while improving its robustness. Additionally, we construct a new benchmark to evaluate the robustness of fine-tuned LMs in terms of four representative aspects of model robustness in a unified way. Under these benchmarks, we demonstrate the effectiveness of RAM compared to other state-of-the-art fine-tuning methods, and verify that RAM is successfully robustifying various types of LMs. Our work suggests a rethinking of the robustness aspect of LMs as an essential direction for their reliable deployment, along with a simple yet effective solution.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)

5 Replies

Loading