Using Proto-Value Functions for Curriculum Generation in Goal-Conditioned RL

Henrik Metternich; Ahmed Hendawy; Pascal Klink; Jan Peters; Carlo D'Eramo

Using Proto-Value Functions for Curriculum Generation in Goal-Conditioned RL

Henrik Metternich, Ahmed Hendawy, Pascal Klink, Jan Peters, Carlo D'Eramo

Published: 03 Nov 2023, Last Modified: 27 Nov 2023GCRL WorkshopEveryoneRevisionsBibTeX

Confirmation: I have read and confirm that at least one author will be attending the workshop in person if the submission is accepted

Keywords: Reinforcement Learning, Curriculum Learning, Graph Laplacian

TL;DR: A curriculum learning approach that uses Proto-Value functions to meausre task similarities in goal-conditioned Reinforcement Learning.

Abstract: In this paper, we investigate the use of Proto Value Functions (PVFs) for measuring the similarity between tasks in the context of Curriculum Learning (CL). PVFs serve as a mathematical framework for generating basis functions for the state space of a Markov Decision Process (MDP). They capture the structure of the state space manifold and have been shown to be useful for value function approximation in Reinforcement Learning (RL). We show that even a few PVFs allow us to estimate the similarity between tasks. Based on this observation, we introduce a new algorithm called Curriculum Representation Policy Iteration (CRPI) that uses PVFs for CL, and we provide a proof of concept in a Goal-Conditioned Reinforcement Learning (GCRL) setting.

Submission Number: 17

Loading