Assumption Questioning: Latent Copying and Reward Exploitation in Question GenerationDownload PDF

27 Sept 2018, 22:36 (edited 21 Dec 2018)ICLR 2019 Conference Blind SubmissionReaders: Everyone
  • Keywords: question generation, answer questioning, pointer networks, reinforcement learning
  • TL;DR: An investigation into latent copy mechanisms for question generation and correlations of external reward models with human evaluation.
  • Abstract: Question generation is an important task for improving our ability to process natural language data, with additional challenges over other sequence transformation tasks. Recent approaches use modifications to a Seq2Seq architecture inspired by advances in machine translation, but unlike translation the input and output vocabularies overlap significantly, and there are many different valid questions for each input. Approaches using copy mechanisms and reinforcement learning have shown promising results, but there are ambiguities in the exact implementation that have not yet been investigated. We show that by removing inductive bias from the model and allowing the choice of generation path to become latent, we achieve substantial improvements over implementations biased with both naive and smart heuristics. We perform a human evaluation to confirm these findings. We show that although policy gradient methods may be used to decouple training from the ground truth and optimise directly for quality metrics that have previously been assumed to be good choices, these objectives are poorly aligned with human judgement and the model simply learns to exploit the weaknesses of the reward source. Finally, we show that an adversarial objective learned directly from the ground truth data is not able to generate a useful training signal.
7 Replies