Model Risk-sensitive Offline Reinforcement Learning

Gwangpyo Yoo; Honguk Woo

Model Risk-sensitive Offline Reinforcement Learning

Gwangpyo Yoo, Honguk Woo

Published: 22 Jan 2025, Last Modified: 18 May 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: risk-sensitive offline reinforcement learning, reinforcement learning, offline reinforcement learning, risk, model risk

TL;DR: We proposed a model risk-sensitive offline RL framework, devising the critic-ensemble criterion to capture model risk effectively. To ensure the precision of model risk calculation, we employed Fourier feature networks.

Abstract: Offline reinforcement learning (RL) is becoming critical in risk-sensitive areas such as finance and autonomous driving, where incorrect decisions can lead to substantial financial loss or compromised safety. However, traditional risk-sensitive offline RL methods often struggle with accurately assessing risk, with minor errors in the estimated return potentially causing significant inaccuracies of risk estimation. These challenges are intensified by distribution shifts inherent in offline RL. To mitigate these issues, we propose a model risk-sensitive offline RL framework designed to minimize the worst-case of risks across a set of plausible alternative scenarios rather than solely focusing on minimizing estimated risk. We present a critic-ensemble criterion method that identifies the plausible alternative scenarios without introducing additional hyperparameters. We also incorporate the learned Fourier feature framework and the IQN framework to address spectral bias in neural networks, which can otherwise lead to severe errors in calculating model risk. Our experiments in finance and self-driving scenarios demonstrate that the proposed framework significantly reduces risk, by $11.2\%$ to $18.5\%$, compared to the most outperforming risk-sensitive offline RL baseline, particularly in highly uncertain environments.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6471

Loading