Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Deep Reinforcement Learning, Exploration
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Diverse exploration while maintaining simplicity, generality and computational efficiency.
Abstract: Efficient exploration remains a pivotal challenge in reinforcement learning (RL).
While numerous methods have been proposed, their lack of simplicity, generality and computational efficiency often lead researchers to choose simple techniques such as $\epsilon$-greedy.
Motivated by these considerations, we propose $\beta$-DQN.
This method improves exploration by constructing a set of diverse polices through a behavior function $\beta$ learned from the replay memory.
First, $\beta$ differentiates actions based on their frequency at each state, which can be used to design strategies for better state coverage.
Second, we constrain temporal difference (TD) learning to in-sample data and derive two functions $Q$ and $Q_{\textit{mask}}$.
Function $Q$ may overestimate unseen actions, providing a foundation for bias correction exploration.
$Q_{\textit{mask}}$ reduces the values of unseen actions in $Q$ using $\beta$ as an action mask, thus yields a greedy policy that purely exploit in-sample data.
We combine $\beta, Q, Q_{\textit{mask}}$ to construct a set of policies ranging from exploration to exploitation.
Then an adaptive meta-controller selects an effective policy for each episode.
$\beta$-DQN is straightforward to implement, imposes minimal hyper-parameter tuning demands, and adds a modest computational overhead to DQN.
Our experiments, conducted on simple and challenging exploration domains, demonstrate $\beta$-DQN significantly enhances performance and exhibits broad applicability across a wide range of tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7079
Loading