- Abstract: While reinforcement learning (RL) shows a lot of promise for natural language processing—e.g. when fine-tuning natural language systems for optimizing a certain objective—there has been little investigation into potential language drift: when an external reward is used to train a system, the agents’ communication protocol may easily and radically diverge from natural language. By re-casting translation as a communication game, we show that language drift indeed happens when pre-trained agents are fine-tuned with policy gradient methods. We contend that simply adding a "naturalness" constraint to the reward, e.g. by using language model log likelihood, does not fully address the issue, and argue that (perceptual) grounding is required. That is, while language model constraints impose syntactic conformity, they do not lead to semantic correspondence. Our experiments show that grounded models give the best communication performance, while retaining English syntax along with the ability to convey the intended semantics.
- Keywords: grounding, policy gradient, language drift, reinforcement learning
- TL;DR: Grounding helps avoid language drift during fine-tuning natural language agents with policy gradients.