On the Effectiveness of Adapting Pre-trained Transformer Models via Adversarial Noise

Minjia Zhang; Niranjan Hegde; Yuxiong He

On the Effectiveness of Adapting Pre-trained Transformer Models via Adversarial Noise

Minjia Zhang, Niranjan Hegde, Yuxiong He

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Fast adaptation, Pre-trained Transformer Networks, Natural Language Understanding

TL;DR: We investigate the computation efficiency vs. generalization in adapting natural language understanding tasks and propose a method to accelerates model adaptation of Transformers by up to 9.8 times.

Abstract: Pretraining Transformer-based language models followed by adapting the pre-trained models to a downstream task is an effective transfer mechanism in NLP. While it is well-known that the pretraining stage is computationally expensive, the downstream adaptation also becomes costly as Transformers grow in size rapidly and the wide usage scenarios of fine-tuning pre-trained Transformers. In this work, we find that techniques that have demonstrated success in accelerating the pre-training tasks, such as large-batch optimizations, lead to severe accuracy degradation. We find strong regularization techniques such as adversarial training help to close the accuracy gap. However, the computational complexity associated with this approach, due to the high cost of generating adversaries, prevents it from reducing adaptation costs even with a large number of GPUs. As such, we systematically study both the computation efficiency and generalization of adversarial training for adapting pre-trained transformers, under a large-batch optimization regime. Our investigation yields simple yet effective algorithms for adapting transformer models. We show in experiments that our proposed method attains up to 9.8$\times$ adaptation speedups over the baseline on BERT$_{base}$ and RoBERTa$_{large}$, while achieving comparable and sometimes higher accuracy than fine-tuning using existing baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

5 Replies

Loading