Policy gradient approaches for multi-objective sequential decision making

Simone Parisi, Matteo Pirotta, Nicola Smacchia, Luca Bascetta, Marcello Restelli

2014 (modified: 08 Nov 2022)IJCNN 2014Readers: Everyone

Abstract: This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy gradient algorithms and the fact that gradient ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Two different Multi-Objective Reinforcement-Learning (MORL) approaches, called radial and Pareto following, that, starting from an initial policy, perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies are here presented. Both algorithms are empirically evaluated and compared to state-of-the-art MORL algorithms on three MORL benchmark problems.

0 Replies