Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies

Lingwei Zhu; Han Wang; Yukie Nagai

Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies

Lingwei Zhu, Han Wang, Yukie Nagai

Published: 22 Jan 2025, Last Modified: 25 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, offline reinforcement learning, actor critic, sparse policies

TL;DR: this paper for the first time addresses the issues of offline reinforcement learning with sparse policies

Abstract: Sparse continuous policies are distributions that can choose some actions at random yet keep strictly zero probability for the other actions, which are radically different from the Gaussian. They have important real-world implications, e.g. in modeling safety-critical tasks like medicine. The combination of offline reinforcement learning and sparse policies provides a novel paradigm that enables learning completely from logged datasets a safety-aware sparse policy. However, sparse policies can cause difficulty with the existing offline algorithms which require evaluating actions that fall outside of the current support. In this paper, we propose the first offline policy optimization algorithm that tackles this challenge: Fat-to-Thin Policy Optimization (FtTPO). Specifically, we maintain a fat (heavy-tailed) proposal policy that effectively learns from the dataset and injects knowledge to a thin (sparse) policy, which is responsible for interacting with the environment. We instantiate FtTPO with the general $q$-Gaussian family that encompasses both heavy-tailed and sparse policies and verify that it performs favorably in a safety-critical treatment simulation and the standard MuJoCo suite. Our code is available at https://github.com/lingweizhu/fat2thin.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5466

Loading