Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

Dhruv Shah; Peng Xu; Yao Lu; Ted Xiao; Alexander T Toshev; Sergey Levine; brian ichter

Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander T Toshev, Sergey Levine, brian ichter

12 Oct 2021 (modified: 04 May 2025)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: hierarchical reinforcement learning, planning, representation learning, robotics

TL;DR: We introduce value function spaces, a learned representation of state through the values of low-level skills, which capture affordances and ignores distractors to enable long-horizon reasoning and zero-shot generalization.

Abstract: Reinforcement learning can train policies that effectively perform complex tasks. However for long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions. Hierarchies can further improve on this by abstracting the space states as well. We posit that a suitable state abstraction should depend on the capabilities of the available lower-level policies. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill. These value functions capture the affordances of the scene, thus forming a representation that compactly abstracts task relevant information and robustly ignores distractors. Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than alternative model-free and model-based methods.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/value-function-spaces-skill-centric-state/code)

0 Replies

Loading