Exploring the State and Action Space in Reinforcement Learning with Infinite-Dimensional Confidence Balls
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: online reinforcement learning, reproducing kernel Hilbert space, embedding learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a novel approach that leverages reproducing kernel Hilbert spaces (RKHSs) to tackle the curse of dimensionality problem on online reinforcement learning with continuous state and action spaces.
Abstract: Reinforcement Learning (RL) is a powerful tool for solving complex decision-making problems. However, existing RL approaches suffer from the curse of dimensionality when dealing with large or continuous state and action spaces. This paper introduces a non-parametric online RL algorithm called RKHS-RL that overcomes these challenges by utilizing reproducing kernels and the RKHS-embedding assumption. The proposed algorithm can handle both finite and infinite state and action spaces, as well as nonlinear relationships in transition probabilities. The RKHS-RL algorithm estimates the transition core using ridge regression and balances exploration and exploitation through infinite-dimensional confidence balls. The paper provides theoretical guarantees, demonstrating that RKHS-RL achieves a sublinear regret bound of $\tilde{\mathcal{O}}(H\sqrt{T})$, where $T$ denotes the time step of the algorithm and $H$ represents the horizon of the Markov Decision Process (MDP), making it an effective approach for RL problems.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7398
Loading