Transformers can reinforcement learn to approximate Gittins Index

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: In this work we train transformer in online-RL manner and show that it can meta-learn Gittins index
Abstract: Transformers have demonstrated the ability to approximate in-context a rich class of functions in supervised learning and more recently in reinforcement learning (RL) settings. In this work, we investigate the transformer's ability to in-context learn the Gittins index, an online RL algorithm computed via dynamic programming (DP) and known to be optimal in Bayesian Bernoulli bandits. Our experiments show that the transformer can learn to approximate this strategy very well in a pure RL manner, without expert demonstrations, especially after we account for the problem's underlying symmetric properties. Our results, therefore, serve as empirical evidence that the class of RL algorithms transformers can learn in context extends to include certain DP-based algorithms.
Style Files: I have used the style files.
Submission Number: 63
Loading