Policy Gradient Methods with Adaptive Policy Spaces

Gianmarco Tedeschi; Matteo Papini; Alberto Maria Metelli; Marcello Restelli

Policy Gradient Methods with Adaptive Policy Spaces

Gianmarco Tedeschi, Matteo Papini, Alberto Maria Metelli, Marcello Restelli

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Policy Search, Policy Optimization, Adaptive, Policy Gradient, Policy Space

Abstract: Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; then, based on the observations we receive from the unknown environment, we build a sequence of policy spaces of increasing complexity, which yield more sophisticated optimized policies at each epoch. The final result is a parametric policy whose structure (including the number of parameters) is fitted on the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the sequence of policies so obtained, and comparing the results with traditional policy optimization methods that employ a fixed policy space.

Supplementary Material: zip

Submission Number: 21

Loading