EFSG: Evolutionary Fooling Sentences Generator

Marco Di Giovanni, Marco Brambilla

Published: 2021, Last Modified: 28 Jan 2024ICSC 2021Readers: Everyone

Abstract: Large pre-trained language representation models (LMs) have recently collected a huge number of successes in many NLP tasks. In 2018 BERT, and later its successors (e.g. RoBERTa), obtained state-of-the-art results in classical benchmark tasks, such as GLUE. Works about adversarial attacks have been published to test their generalization proprieties and robustness. In this study, we propose Evolutionary Fooling Sentences Generator (EFSG), a black-box task-agnostic adversarial attack algorithm designed in an evolutionary fashion to generate false-positive sentences for binary classification tasks. We successfully apply EFSG to single-sentence (CoLA) and sentence-pair (MRPC) classification tasks, on BERT and RoBERTa. Results prove the presence of weak spots in state-of-the-art LMs. To complete the analysis, we perform transferability tests and ablation study. Finally, adversarial training helps as a data augmentation defence approach against EFSG, obtaining stronger improved models with no loss of accuracy.

0 Replies