- TL;DR: We demonstrate vulnerability to undersensitivity attacks in SQuAD2.0 and NewsQA neural reading comprehension models, where the model predicts the same answer with increased confidence to adversarially chosen questions, and compare defence strategies.
- Abstract: Neural reading comprehension models have recently achieved impressive gener- alisation results, yet still perform poorly when given adversarially selected input. Most prior work has studied semantically invariant text perturbations which cause a model’s prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity where input text is meaningfully changed, and the model’s prediction does not change when it should. We formulate a noisy adversarial attack which searches among semantic variations of comprehension questions for which a model still erroneously pro- duces the same answer as the original question – and with an even higher prob- ability. We show that – despite comprising unanswerable questions – SQuAD2.0 and NewsQA models are vulnerable to this attack and commit a substantial frac- tion of errors on adversarially generated questions. This indicates that current models—even where they can correctly predict the answer—rely on spurious sur- face patterns and are not necessarily aware of all information provided in a given comprehension question. Developing this further, we experiment with both data augmentation and adversarial training as defence strategies: both are able to sub- stantially decrease a model’s vulnerability to undersensitivity attacks on held out evaluation data. Finally, we demonstrate that adversarially robust models gener- alise better in a biased data setting with a train/evaluation distribution mismatch; they are less prone to overly rely on predictive cues only present in the training set and outperform a conventional model in the biased data setting by up to 11% F1.
- Keywords: reading comprehension, undersensitivity, adversarial questions, adversarial training, robustness, biased data setting