HAL: Harmonic Learning in High-Dimensional MDPs

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: harmonic learning, harmonic analytic basis training
Abstract: Since the initial successes of deep reinforcement learning on learning policies purely by interacting with complex high-dimensional state representations and a decade of extensive research, deep neural policies have been applied to a striking variety of fields ranging from pharmaceuticals to foundation models. Yet, one of the strongest assumptions of reinforcement learning is to expect to receive a reward signal from the MDP. While this assumption comes in handy in certain fields, i.e. automated financial markets, it does not naturally fit in many others where the computational complexity of providing such a signal for the task at hand is larger than in fact learning one. Thus, in this paper we focus on learning policies in MDPs without this assumption, and study sequential decision making without having access to information on rewards provided by the MDP. We introduce We introduce harmonic learning, a training method in high-dimensional MDPs, and provide a theoretically well-founded algorithm that significantly improves the sample complexity of deep neural policies. The theoretical and empirical analysis reported in our paper demonstrates that harmonic learning achieves substantial improvements in sample efficient training while constructing more stable and resilient policies that can generalize to uncertain environments.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11774
Loading