Continual Auxiliary Task Learning

Matthew K McLeod; Chunlok Lo; Matthew Kyle Schlegel; Andrew Jacobsen; Raksha Kumaraswamy; Martha White; Adam M White

Continual Auxiliary Task Learning

Matthew K McLeod, Chunlok Lo, Matthew Kyle Schlegel, Andrew Jacobsen, Raksha Kumaraswamy, Martha White, Adam M White

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning, Never-Ending Learning, Auxiliary Tasks, Intrinsic Motivation, Off-Policy Learning, Successor Features, General Value Functions

TL;DR: Adapting behaviour for auxiliary task learning

Abstract: Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions. We highlight the inherent non-stationarity in this continual auxiliary task learning problem, for both prediction learners and the behavior learner. We develop an algorithm based on successor features that facilitates tracking under non-stationary rewards, and prove the separation into learning successor features and rewards provides convergence rate improvements. We conduct an in-depth study into the resulting multi-prediction learning system.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/MatthewMcLeod/curiosity

13 Replies

Loading