Abstract: A good representation of knowledge is one that can answer a set of useful questions about some aspects of future experience. One use of questions is as auxiliary tasks; by learning to answer auxiliary task questions the agent rapidly builds a representation that supports a main reinforcement learning (RL) task. A major outstanding issue is how to discover useful questions directly from experience. We propose a solution to the discovery problem, based on a principled non-myopic meta-gradient procedure, that explicitly optimises the usefulness of the questions via the induced representation’s effectiveness in solving the main RL task. We apply our meta-gradient discovery algorithm to the class of generalized value functions (GVFs). We directly evaluate the ability of the discovered questions to lead to good representations by stopping the gradient from the main reinforcement learning task from adapting to the representations. We show that the auxiliary tasks based on the discovered questions lead to representations that support main task learning, and that they do so better than hand-designed auxiliary tasks.
CMT Num: 4978
0 Replies
Loading