Keywords: Exploration
TL;DR: A novel exploration method in Reinforcement Learning
Abstract: Reinforcement learning in environments with sparse rewards presents a formidable conundrum. Numerous exploration techniques strive to surmount this challenge by inciting agents to explore novel states. However, as familiarity with the environment burgeons, the novelty of states wanes, yielding an unguided exploration trajectory during later phases of learning. To surmount this quandary, this study posits that the difficulty of attaining a state functions as a more potent intrinsic motivational beacon, guiding the agent throughout the learning process. This difficulty signal encapsulates pivotal insights into the environment's underlying structure and the task's trajectory, a facet transcending the exclusive purview of state novelty. Subsequently, we introduce a reward prediction network to acquire a hybrid reward sourced from both state difficulty and novelty. Initially elevated for novel states, this reward progressively converges toward the state's inherent difficulty as visitations accumulate. This dynamic formulation assuages the scourge of catastrophic forgetting, shepherding the agent precisely across the learning odyssey. We establish the theoretical underpinnings of this reward mechanism as a distinct manifestation of reward shaping. It ensures the consistency between the learned policy and the original policy and additionally transforms the sparse reward problem into a dense reward problem, consequently accelerating the entire learning process. We evaluate the proposed Difficulty and Novelty Co-driven Exploration agent on several tasks with sparse rewards, and it consistently achieves satisfactory results.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 23047
Loading