Language Decision Transformers with Exponential Tilt for Interactive Text Environments

Nicolas Gontier; Pau Rodriguez; Issam H. Laradji; David Vazquez; Christopher Pal

Language Decision Transformers with Exponential Tilt for Interactive Text Environments

Nicolas Gontier, Pau Rodriguez, Issam H. Laradji, David Vazquez, Christopher Pal

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Decision Transformers, Transformers, Language, Jericho, Text Games, Reinforcement Learning

TL;DR: We present Language Decision Transformers (LDTs) with exponential tilt as an offline RL method to solve text-based games.

Abstract: Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text, and learn from sparse rewards. We address these challenges by proposing Language Decision Transformers (LDTs), a framework that is based on transformer language models and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1) exponential tilt to guide the agent towards high obtainable goals, (2) novel goal conditioning methods yielding better results than the traditional return-to-go (sum of all future rewards), and (3) a model of future observations that improves agent performance. LDTs are the first to address offline RL with DTs on these challenging games. Our experiments show that LDTs achieve the highest scores among many different types of agents on some of the most challenging Jericho games, such as Enchanter.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5790

Loading