Unsupervised Pretraining for Neural Value Approximation

Spilios Evmorfos; Suat Gumussoy

Unsupervised Pretraining for Neural Value Approximation

Spilios Evmorfos, Suat Gumussoy

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: reinforcement learning, Neural Tangent Kernels, unsupervised pretraining, neural value approximation

TL;DR: The paper presents an unsupervised pretraining approach that learns initializations of the critic/value network which possess desirable generalization properties in the context of deep reinforcement learning.

Abstract: Deep neural networks are powerful function approximators and have successfully been employed for the parameterization of value functions in deep reinforcement learning. Neural value approximation is a powerful paradigm for model-free control but it can often result in instability and divergence, especially when combined with off-policy learning and bootstrapping. Recent works have revealed some intrinsic connections between the unstable behavior of neural value approximation and the generalization properties of the value network/critic. Motivated by this, we propose a simple and computationally efficient unsupervised pretraining method to be performed before neural value learning. The method learns initializations of the critic parameters that correspond to Neural Tangent Kernels with desirable generalization structures. We demonstrate the merits of our approach by combining it with the Soft Actor-Critic algorithm and testing its performance on the continuous control environments of the DeepMind Control Suite. Our approach results in considerable improvements in reward accumulation, sample efficiency and stability for the majority of the domain environments. Furthermore, the use of the proposed pretraining enables us to retain the performance gains when changing the in between layers activation function of the critic architecture.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

18 Replies

Loading