Keywords: Reward Shaping, Reinforcement Learning
Abstract: Effective learners improve task performance and acquire new skills more efficiently by leveraging related prior knowledge. Reward shaping is central to many such approaches and facilitates knowledge transfer. However, misidentifying or misusing prior knowledge can impair learning. To tackle this challenge, we propose a novel shaping method, Target value As Potential (TAP), which uses critic target value as the potential to operate within the canonical Potential-Based Reward Shaping (PBRS) framework. It integrates readily with policy-gradient deep reinforcement learning algorithms and requires only minor modifications to existing training pipelines. This endows TAP with the unique combination of policy invariance and simplicity in implementation, distinguishing it from many model-based methods. Our qualitative analysis and empirical evaluations demonstrate that TAP accelerates convergence compared to baseline DRL algorithms. Moreover, empirical results show that TAP leads to higher cumulative returns. We evaluate TAP-augmented TD3 and D4PG across a range of tasks in the DeepMind Control Suite. TAP significantly improves performance over the original TD3 and D4PG and consistently outperforms other reward shaping methods, including Heuristic-Guided Reinforcement Learning (HuRL) and Dynamic Potential-Based Reward Shaping (DPBRS).
Primary Area: reinforcement learning
Submission Number: 21897
Loading