Bi-linear Value Networks for Multi-goal Reinforcement Learning

Zhang-Wei Hong; Ge Yang; Pulkit Agrawal

Bi-linear Value Networks for Multi-goal Reinforcement Learning

Zhang-Wei Hong, Ge Yang, Pulkit Agrawal

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 PosterReaders: Everyone

Keywords: Multi-goal reinforcement learning, universal value function approximator

Abstract: Universal value functions are a core component of off-policy multi-goal reinforcement learning. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks which lack inductive biases to produce complex interactions between the state s and the goal g. In this work, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, ϕ(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme improves sample efficiency over the original monolithic value approximators, and transfer better to unseen goals. We demonstrate significant learning speed-up over a variety of tasks on a simulated robot arm, and the challenging task of dexterous manipulation with a Shadow hand.

One-sentence Summary: We propose a bilinear value function for multi-goal reinforcement learning and show superior sample efficiency and generalizability.

Supplementary Material: zip

16 Replies

Loading