Convex is back: \\ Solving Belief MDPs via Convexity-Informed Deep Reinforcement Learning

Daniel Koutas; Daniel Hettegger; Kostas G. Papakonstantinou; Daniel Straub

Convex is back: \\ Solving Belief MDPs via Convexity-Informed Deep Reinforcement Learning

Daniel Koutas, Daniel Hettegger, Kostas G. Papakonstantinou, Daniel Straub

17 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Reinforcement Learning, POMDP, Convexity

Abstract: We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1307

Loading