Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Kumar Sharma; Wesley A. Suttle; S Sivaranjani

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Kumar Sharma, Wesley A. Suttle, S Sivaranjani

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: policy gradient, proportional-integral-derivative control, reinforcement learning, non-convex optimization

TL;DR: We develop policy gradient algorithms with global optimality and convergence guarantees for reinforcement learning with PID control policy parameterization.

Abstract: We develop policy gradient algorithms with global optimality and convergence guarantees for reinforcement learning (RL) with proportional-integral-derivative (PID) parameterized control policies. RL enables learning control policies through direct interaction with a system, without explicit model knowledge that is typically assumed in classical control. The PID policy architecture offers built-in structural advantages, such as superior tracking performance, elimination of steady-state errors, and robustness to model error that have made it a widely adopted paradigm in practice. Despite these advantages, the PID parameterization has received limited attention in the RL literature, and PID control designs continue to rely on heuristic tuning rules without theoretical guarantees. We address this gap by rigorously integrating PID control with RL, offering theoretical guarantees while maintaining the practical advantages that have made PID control ubiquitous in practice. Specifically, we first formulate PID control design as an optimization problem with a control policy that is parameterized by proportional, integral, and derivative components. We derive exact expressions for policy gradients in these parameters, and leverage them to develop both model-based and model-free policy gradient algorithms for PID policies. We then establish gradient dominance properties of the PID policy optimization problem, and provide theoretical guarantees on convergence and global optimality in this setting. Finally, we benchmark the performance of our algorithms on the controlgym suite of environments.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 24962

Loading