Keywords: reinforcement learning, synthetic data, synthetic environments, imitation learning, in-context learning, meta-learning
TL;DR: We train in-context imitation learners on only synthetic environments and they match OOD performance of agents trained on real environments.
Abstract: Current AI models are trained on huge datasets of real world data.
This is increasingly true in RL, with generalist agents being trained on data from hundreds of real environments.
It is thought that real data/environments are the only way to capture the intricate complexities of real world RL tasks.
In this paper, we challenge this notion by training generalist in-context decision making agents on only data generated by simple random processes.
We investigate data generated from eight different families of synthetic environments ranging from Markov chains and bandits to discrete, continuous, and hybrid Markov decision processes (MDPs).
Surprisingly, the resulting agents' performances are comparable to agents trained on real environment data.
We additionally analyze what properties of the pretraining MDPs are ideal for creating good agents, thus giving RL practitioners insights on choosing which environments to train on.
Submission Number: 34
Loading