Abstract: This paper builds on the Open-Ended Reinforcement Learning with Neural Reward Functions proposed by Meier and Mujika [1] that uses reward functions encoded by neural networks. One key limitation of their paper is the necessity of re-learning for each new skill learned by the agent. Consequently, we propose integrating meta-learning algorithms to tackle this problem. We, therefore, study the use of MAML, Model-Agnostic Meta Learning that we believe could make policy learning more efficient. MAML operates by learning and initializing the model parameters that can be fine-tuned with a few examples from a new task, allowing for rapid adaptation to new tasks.
Loading