Stochastic Differential Policy Optimization: A Rough Path Approach to Reinforcement Learning

Published: 28 Jun 2025, Last Modified: 28 Jun 2025TASC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Stochastic Control, Reinforcement Learning, Rough Path Theory, Pontryagin Maximum Principle, Operator Learning
TL;DR: We extend differential RL to the stochastic setting with theoretical results on convergence, sample complexity, and regret.
Abstract: We extend Differential Policy Optimization (DPO) to stochastic settings by deriving a discrete-time algorithm from the stochastic Pontryagin Maximum Principle using rough path theory. The framework preserves DPO's operator-based structure while incorporating stochasticity via Brownian and second-level rough path increments. We prove pointwise convergence, establish sample complexity bounds, and derive a regret bound of $O(K^{5/6})$. This provides a theoretically grounded approach to policy learning in continuous-time stochastic control settings.
Submission Number: 10
Loading