Achieve Performatively Optimal Policy for Performative Reinforcement Learning

Ziyi Chen; Heng Huang

Achieve Performatively Optimal Policy for Performative Reinforcement Learning

Ziyi Chen, Heng Huang

23 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This work proposes the first algorithm that converges to the desired performatively optimal policy with polynomial computation complexity for performative reinforcement learning.

Abstract: Performative reinforcement learning is an emerging dynamical decision making framework, which extends reinforcement learning to the common applications where the agent's policy can change the environmental dynamics. Existing works on performative reinforcement learning only aim at a performatively stable (PS) policy that maximizes an approximate value function. However, there can be a positive constant gap between the PS policy and the desired performatively optimal (PO) policy that maximizes the original value function. In contrast, this work proposes a zeroth-order performative policy gradient (0-PPG) algorithm that **for the first time converges to the desired PO policy with polynomial computation complexity under mild conditions**. For the convergence analysis, we prove two important properties of the nonconvex value function. First, when the policy regularizer dominates the environmental shift, the value function satisfies a certain gradient dominance property, so that any stationary point of the value function is a desired PO. Second, though the value function has unbounded gradient, we prove that all the sufficiently stationary points lie in a convex and compact policy subspace $\Pi_{\Delta}$, where the policy value has a constant lower bound $\Delta>0$ and thus the gradient becomes bounded and Lipschitz continuous.

Primary Area: Reinforcement Learning

Keywords: performative reinforcement learning, performatively optimal

Submission Number: 13617

Loading