Imitation Transformer

TMLR Paper1860 Authors

23 Nov 2023 (modified: 05 Apr 2024)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: We propose a simple but effective batch imitation learning method. Our algorithm works by solving a sequence of two supervised learning problems, first learning a reward function and then using a batch reinforcement learning oracle to learn a policy. We develop a highly scalable implementation using the transformer architecture and upside-down reinforcement learning. We also analyze an idealized variant of the algorithm for the tabular case and provide a finite-data regret bound. Experiments on a set of ATARI games and MuJoCo continuous control tasks demonstrate good empirical performance.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Steven_Stenberg_Hansen1
Submission Number: 1860
Loading