An Actor-Critic Algorithm for Sequence Prediction

Dzmitry Bahdanau; Philemon Brakel; Kelvin Xu; Anirudh Goyal; Ryan Lowe; Joelle Pineau; Aaron Courville; Yoshua Bengio

An Actor-Critic Algorithm for Sequence Prediction

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

Published: 06 Feb 2017, Last Modified: 22 Jun 2025ICLR 2017 PosterReaders: Everyone

Abstract: We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a textit{critic} network that is trained to predict the value of an output token, given the policy of an textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

TL;DR: Adapting Actor-Critic methods from reinforcement learning to structured prediction

Conflicts: umontreal.ca, google.com, mcgill.ca

Keywords: Natural language processing, Deep learning, Reinforcement Learning, Structured prediction

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/an-actor-critic-algorithm-for-sequence/code)

23 Replies

Loading