Contextual Transformers for Goal-Oriented Reinforcement Learning

Oliver Dippel, Alexei Lisitsa, Bei Peng

Published: 01 Jan 2024, Last Modified: 02 Aug 2025SGAI Conf. (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Transformer architectures have become popular across deep-learning disciplines due to their capability of efficiently integrating information across extensive temporal spans and handling large datasets. Recently, this property of transformer models has also been utilized for reinforcement learning (RL) by learning in-context. In in-context learning for decision-making problems, i.e., RL, a transformer model is usually pre-trained on an offline dataset and is tasked to predict the most likely action given a context. Such a model is able to make inference on the fly without parameter updates. Despite great success, the use of transformer architectures for RL is still in its infancy. In this paper, we further investigate the in-context learning abilities of transformer-based goal-oriented RL. We introduce Goal-Focused Transformer (GFT), a transformer meta-agent for goal-oriented RL. Building upon the Decision-Pretrained Transformer (DPT), GFT incorporates a function which distills goal information from the context, which we refer to as “goal-controller” (gc) and facilitates task inference during evaluation. By learning to distil useful information from context about the goal states, GFT enhances the exploration-exploitation dynamics and achieves superior performance and stability compared to DPT in environments with sparse rewards. Our contributions highlight GFT’s efficacy in increasing average return, enhancing data efficiency, and providing a valuable mechanism for operating in dynamic environments while consistently striving to achieve predefined objectives.