Mitigating Bias in Natural Language Inference Using Adversarial Learning

Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme

Sep 27, 2018 ICLR 2019 Conference Withdrawn Submission readers: everyone
  • Abstract: Recognizing the relationship between two texts is an important aspect of natural language understanding (NLU), and a variety of neural network models have been proposed for solving NLU tasks. Unfortunately, recent work showed that the datasets these models are trained on often contain biases that allow models to achieve non-trivial performance without possibly learning the relationship between the two texts. We propose a framework for building robust models by using adversarial learning to encourage models to learn latent, bias-free representations. We test our approach in a Natural Language Inference (NLI) scenario, and show that our adversarially-trained models learn robust representations that ignore known dataset-specific biases. Our experiments demonstrate that our models are more robust to new NLI datasets.
  • Keywords: natural language inference, adversarial learning, bias, artifacts
  • TL;DR: Adversarial learning methods encourage NLI models to ignore dataset-specific biases and help models transfer across datasets.
0 Replies