Keywords: In-Context Reinforcement Learning
Abstract: Transformers trained with offline expert-level data have shown remarkable success in In-Context Reinforcement Learning (ICRL), enabling effective decision-making in unseen environments. However, the performance of these models heavily depends on optimal or expert-level trajectories, making them expensive in various real-world scenarios. In this work, we introduce Q-Target Pretrained Transformers (QTPT), a novel framework that leverages Q-learning instead of supervised learning during the training stage. In particular, QTPT doesn't require optimal-labeled actions or expert trajectories, and provides a practical solution for real-world applications. We theoretically establish the performance guarantee for QTPT and show its superior robustness to data quality compared to traditional supervised learning approaches. Through comprehensive empirical evaluations, QTPT consistently outperforms existing approaches, especially when trained on data sampled with non-expert policies.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 18814
Loading