- Keywords: Synthetic Environments, Synthetic Data, Meta-Learning, Reinforcement Learning, Evolution Strategies, Reward Shaping
- Abstract: We explore meta-learning agent-agnostic neural Synthetic Environments (SEs) and Reward Networks (RNs) as proxy models for training Reinforcement Learning (RL) agents. While an SE acts as a full proxy to a real environment by learning about its state dynamics and rewards, an RN resembles a partial proxy that learns to augment or replace rewards. We use bi-level optimization to evolve SEs and RNs: the inner loop trains the RL agent, and the outer loop trains the parameters of the SE / RN via an evolution strategy. We evaluate our proposed new concept of learning SEs / RNs on a broad range of RL algorithms and classic control environments. In a one-to-one comparison, learning an SE proxy requires more interactions with the real environment than training agents only on the real environment. However, once such a SE proxy has been learned, we do not need any interactions with the real environment to train new agents. Moreover, the learned SE proxies allow us to train agents with fewer interactions while maintaining the original task performance. Our empirical results suggest that SEs achieve this surprising result by learning informed representations that bias the agents towards relevant states, making the learned representation surprisingly interpretable. Moreover, we ﬁnd that these proxies are robust against hyperparameter variation and can also transfer to unseen agents.
- One-sentence Summary: We propose an evolution-based approach to meta-learn synthetic neural environments and reward neural networks for reinforcement learning.