BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text ClassificationDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: In this work, we investigate the robustness of BERT using four word substitution-based attacks. We combine a human evaluation of individual word substitutions and a probabilistic analysis to show that between 96% and 99% of the analyzed attacks do not preserve semantics, indicating that their success is mainly based on feeding poor data to the model. To further confirm that, we introduce an efficient data augmentation procedure and show that many successful attacks can be prevented by including data similar to adversarial examples during training. Compared to traditional adversarial training, our data augmentation procedure requires 30x less computation time per epoch, while achieving better performance on two out of three datasets. We introduce an additional post-processing step that reduces the success rates of state-of-the-art attacks below 4%, 5%, and 8% on the three considered datasets. Finally, by looking at constraints for word substitutions that better preserve the semantics, we conclude that BERT is considerably more robust than previous research suggests.
0 Replies

Loading