Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games

Arjun V Sudhakar; Prasanna Parthasarathi; Janarthanan Rajendran; Sarath Chandar

Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games

Arjun V Sudhakar, Prasanna Parthasarathi, Janarthanan Rajendran, Sarath Chandar

Published: 03 Jul 2024, Last Modified: 03 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Natural Language Processing, Reinforcement Learning, Text-Based Games

TL;DR: we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence

Abstract: Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. A recent use case for LLMs involves training decision-making agents over textual information. The existing approach leverages LLM's linguistic priors for action candidate recommendations in text games, i.e., to operate without environment-provided actions. However, adapting LLMs to specific games/tasks requires a massive amount of annotated human gameplay. Moreover, in the existing approach, the language model was kept frozen during an agent's training process, which limits learning from in-game knowledge about the world. Hence, we explore strategies to adapt the language model for candidate recommendation with in-game transition in an online learning fashion to mitigate reliance on human-annotated gameplays, which are costly to acquire. In this paper, we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence. Our method demonstrates a 53% relative improvement in average game score over the previous state-of-the-art model, achieving more than twice the convergence rate in a full-annotated dataset setting. Furthermore, even with only 10% of human annotation, we surpassed the 100\% state-of-the-art performance benchmark.

Submission Number: 111

Loading