Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games

Ruoyao Wang; Peter Jansen

Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games

Ruoyao Wang, Peter Jansen

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: NLP Applications

Keywords: text games, reinforcement learning, behavior cloning, self-supervision

TL;DR: We demonstrate that behavior cloned transformers can self-supervise, using generalizability as a self-supervision signal, and achieve 90% of the performance of a supervised model on interactive text games.

Abstract: In this work, we introduce a self-supervised behavior cloning transformer for text games, which are challenging benchmarks for multi-step reasoning in virtual environments. Traditionally, Behavior Cloning Transformers excel in such tasks but rely on supervised training data. Our approach auto-generates training data by exploring trajectories (defined by common macro-action sequences) that lead to reward within the games, while determining the generality and utility of these trajectories by rapidly training small models then evalauating their performance on unseen development games. Through empirical analysis, we show our method consistently uncovers generalizable training data, achieving about 90\% performance of supervised systems across three benchmark text games.

Submission Number: 4385

Loading