On the Effect of Regularization in Policy Mirror Descent

Jan Felix Kleuker; Thomas M. Moerland; Aske Plaat

On the Effect of Regularization in Policy Mirror Descent

Jan Felix Kleuker, Thomas M. Moerland, Aske Plaat

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Policy Mirror Descent, Regularization, Reinforcement Learning

Abstract: Policy Mirror Descent (PMD) has emerged as a unifying framework in reinforce- ment learning (RL) by linking policy gradient methods with a first-order optimiza- tion method known as mirror descent. At its core, PMD incorporates two key regularization components: (i) a distance term that enforces a trust region for stable policy updates and (ii) an MDP regularizer that augments the reward function to promote structure and robustness. While PMD has been extensively studied in theory, empirical investigations remain scarce. This work provides a large-scale empirical analysis of the interplay between these two regularization techniques, running over 500k training seeds on small RL environments. Our results demon- strate that, although the two regularizers can partially substitute each other, their precise combination is critical for achieving robust performance. These findings highlight the potential for advancing research on more robust algorithms in RL.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Jan_Felix_Kleuker1

Track: Fast Track: published work

Publication Link: kleukerjf@liacs.leidenuniv.nl

Submission Number: 105

Loading