Alpha-divergence bridges maximum likelihood and reinforcement learning in neural sequence generation


Nov 07, 2017 (modified: Nov 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Neural sequence generation is commonly approached by using maximum- likelihood (ML) estimation or reinforcement learning (RL). However, it is known that they have their own shortcomings; ML presents training/testing discrepancy, whereas RL suffers from sample inefficiency. We point out that it is difficult to resolve all of the shortcomings simultaneously because of a tradeoff between ML and RL. In order to counteract these problems, we propose an objective function for sequence generation using α-divergence, which leads to an ML-RL integrated method that exploits better parts of ML and RL. We demonstrate that the proposed objective function generalizes ML and RL objective functions because it includes both as its special cases (ML corresponds to α → 0 and RL to α → 1). We provide a proposition stating that the difference between the RL objective function and the proposed one monotonically decreases with increasing α. Experimental results on machine translation tasks show that minimizing the proposed objective function achieves better sequence generation performance than ML-based methods.
  • TL;DR: Propose new objective function for neural sequence generation which integrates ML-based and RL-based objective functions.
  • Keywords: neural network, reinforcement learning, natural language processing, machine translation, alpha-divergence