Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Wei Zhou; Yiying Li; Yongxin Yang; Huaimin Wang; Timothy M. Hospedales

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, Timothy M. Hospedales

25 Sept 2019 (modified: 26 May 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: off-policy actor-critic, reinforcement learning, meta-learning

TL;DR: We present Meta-Critic, an auxiliary critic module for off-policy actor-critic methods that can be meta-learned online during single task learning.

Abstract: Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic’s action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in a variety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/online-meta-critic-learning-for-off-policy/code)

Original Pdf: pdf

10 Replies

Loading