General Risk Measure meets Offline RL: Provably Efficient Risk-Sensitive Offline RL via Optimized Certainty Equivalent

General Risk Measure meets Offline RL: Provably Efficient Risk-Sensitive Offline RL via Optimized Certainty Equivalent

ICLR 2026 Conference Submission25306 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Offline RL, Risk-Sensitive, Optimized Certainty Equivalent, General Risk Measure

Abstract: We study the risk-sensitive reinforcement learning (RL), which is crucial in scenarios involving uncertainty and potential adverse outcomes. However, existing works on risk-sensitive RL either only focus on a specific risk measure or overlook the offline RL setting. In this work, we investigate the provably efficient risk-sensitive RL under the offline setting with a general risk measure, the optimized certainty equivalent (OCE), which captures various risk measures studied in prior risk-sensitive RL works, such as value-at-risk, entropic risk, and mean-variance. To the best of our knowledge, we (i) introduce the first offline OCE-RL frameworks and propose corresponding pessimistic value iteration algorithms (OCE-PVI) for both dynamic and static risk measures; (ii) establish suboptimality bounds for the algorithms, which can reduce to known results for risk-sensitive RL as well as risk-neutral RL with appropriate utility functions; (iii) derive the first information-theoretic lower bound of the sample complexity of offline risk-sensitive RL, matching the upper bounds and certifying optimality of our algorithms; and (iv) propose the first provably efficient risk-sensitive RL with linear function approximation for both dynamic and static risk measures, together with rigorous suboptimality bounds, yielding a scalable and model-free approach.

Primary Area: reinforcement learning

Submission Number: 25306

Loading