Abstract: In modern multimedia systems, adversarial text attack is a vital way to expose the vulnerability of deep neural networks and improve their robustness. However, existing methods have some limitations. For example, character-level insertion attacks cause misspelling errors and word-level attacks tend to make limited lexical variations. Although sentence-level attacks can greatly enrich the variety of sentences, they are less effective towards fooling victim models and sometimes lead to the wrong representation. In this paper, we propose the Parentheses Insertion Sentence-level Text Adversarial Attack (PI) algorithm that crafts adversarial texts by filling frequently used parentheses. Specifically, we collect a parentheses set (\(P_{set}\)) at the beginning where all the parentheses are meaningless to ensure the semantics of the sentence remain unchanged after the insertion. Then we utilize the beam search strategy to merge the selected parentheses in the appropriate text positions to improve the attack success rate (ASR). To evaluate the effectiveness of PI method, we conduct extensive experiments by attacking several popular models. Experimental results show that PI enhances the ASR performance compared to word-level and sentence-level baselines while preserving high semantic similarity and incurring minimal perturbation costs. Additionally, PI helps enhance the robustness of modern NLP models by adversarial training.
Loading