Acceleration in Policy OptimizationDownload PDF

Published: 20 Jul 2023, Last Modified: 01 Sept 2023EWRL16Readers: Everyone
Keywords: acceleration, optimism, adaptivity, inexact policy gradients, policy optimization, actor-critic, meta-gradients, meta-learning, reinforcement learning, extragradient, momentum
TL;DR: We introduce a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) through predictive and adaptive directions of policy ascent.
Abstract: We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) through predictive and adaptive directions of (functional) policy ascent. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bounds on the original objective. We define optimism as predictive modelling of the future behavior of a policy, and hindsight adaptation as taking immediate and anticipatory corrective actions to mitigate accumulating errors from overshooting predictions or delayed responses to change. We use this shared lens to jointly express other well-known algorithms, including model-based policy improvement based on forward search, and optimistic meta-learning algorithms. We show connections with Anderson acceleration, Nesterov's accelerated gradient, extra-gradient methods, and linear extrapolation in the update rule. We analyze properties of the formulation, design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
1 Reply

Loading