Abstract: Recent work has shown that current text classification models are vulnerable to a small adversarial perturbation on inputs, and adversarial training that re-trains the models with the sup- port of adversarial examples is the most popular way to alle- viate the impact of the perturbation. However, current adver- sarial training methods have two principal problems: a drop in  model’s  generalization  and  ineffective  defending  against other  text  attacks.  In  this  paper,  we  propose  a  Keyword- bias-aware Adversarial Text Generation model (KATG) that implicitly generates adversarial sentences using a generator- discriminator structure. Instead of using a benign sentence to generate  an  adversarial  sentence,  the  KATG  model  utilizes extra multiple benign sentences (namely prior sentences) to guide adversarial sentence generation. Furthermore, to cover more perturbations used in existing attacks, a keyword-bias- based sampling is proposed to select sentences containing bi- ased words as prior sentences. Besides, to effectively utilize prior sentences, a generative flow mechanism is proposed to construct  a  latent  semantic  space  for  learning  a  latent  rep- resentation of the prior sentences. Experiments demonstrate that adversarial sentences generated by our KATG model can strengthen the generalization and the robustness of text clas- sification models.
0 Replies
Loading