Towards An Option Basis To Optimize All Rewards

Siddarth Chandrasekar; Marlos C. Machado

Towards An Option Basis To Optimize All Rewards

Siddarth Chandrasekar, Marlos C. Machado

Published: 01 Jul 2025, Last Modified: 15 Jul 2025RLBrew: Ingredients for Developing Generalist Agents workshop (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Unsupervised RL, Hierarchical RL

TL;DR: We propose using the eigenvectors of the graph Laplacian as reward features within the Option Keyboard framework.

Abstract: The Option Keyboard framework enables efficient behavior generation by composing a set of basis options. However, it remains unclear how to construct a global and compact basis, from scratch, for solving any given task in an environment. In this work, we investigate using the eigenvectors of the graph Laplacian of the environment to form such a basis. The behaviors obtained from such eigenvectors are known as eigenoptions. We empirically demonstrate that a sufficiently large eigenoption basis, combined with Generalized Policy Improvement, can recover near-optimal policies in the goal-reaching tasks we considered. Building on this, we introduce the Laplacian Keyboard, which matches this performance while requiring a substantially smaller set of options. Finally, we briefly outline a method for constructing a universal optimal option basis capable of solving any task within a given environment.

Submission Number: 18

Loading