Meta-Gradients in Non-Stationary Environments

Jelena Luketina; Sebastian Flennerhag; Yannick Schroecker; David Abel; Tom Zahavy; Satinder Singh

Meta-Gradients in Non-Stationary Environments

Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

Published: 23 Apr 2022, Last Modified: 05 May 2023ALOE@ICLR2022Readers: Everyone

Keywords: meta-gradients, learning optimizers, continual learning, non-stationary environments, single lifetime, reinforcement learning

TL;DR: We study meta-gradients in non-stationary environments, focusing on the interplay of information available to the meta-learner and the rate of non-stationarity.

Abstract: Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) are a promising approach to the problem of adaptation of hyper-parameters in non-stationary reinforcement learning problems. Recent works enable meta-gradients to adapt faster and learn from experience, by replacing the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features (Almeida et al., 2021; Flennerhag et al., 2022). We refer to these methods as contextual meta-gradients. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. As the properties of meta-gradient methods in non-stationary environments have not been systematically studied, the aim of this work is to provide such an analysis. Concretely, we ask: (i) how much information should be given to the learned optimizers so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance. We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualising meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.

1 Reply

Loading