Abstract: Recent studies show that neural models for natural language processing are usually fragile under adversarial attacks (e.g., character-level insertion and word-level synonym substitution), which exposes the lack of robustness. Most defense techniques are tailored to specific semantic level attacks and do not possess the ability to mitigate multi-level attack simultaneously. Adversarial training has been shown the effectiveness of increasing model robustness. However, it often suffers from degradation on normal data, especially when the proportion of adversarial examples increase. To address this, we propose mixup regularized adversarial training (MRAT) against multi-level attack. Our method can utilize multiple adversarial examples to increase model intrinsic robustness without sacrificing the performance on normal data. We evaluate our method on text classification and entailment tasks. Experimental results on different text encoders (BERT, LSTM and CNN) with multi-level attack show that our method outperforms adversarial training consistently.
Loading