Teacher Forcing Recovers Reward Functions for Text Generation

Yongchang Hao; Yuxin Liu; Lili Mou

Teacher Forcing Recovers Reward Functions for Text Generation

Yongchang Hao, Yuxin Liu, Lili Mou

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: Text Generation, Natural Language Processing, Reinforcement Learning

TL;DR: We derive a reward function for text generation via the lens of inverse reinforcement learning.

Abstract: Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/teacher-forcing-recovers-reward-functions-for/code)

24 Replies

Loading