Abstract: As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s, a, g)$ has a quasipseudometric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses assume a sparse reward setting—a known aggravating factor to sample complexity. We show that the key property underpinning a quasipseudometric, viz., the triangle inequality, is preserved under a dense reward setting as well, specifically identifying the key condition necessary for triangle inequality. Contrary to earlier findings where dense rewards were shown to be detrimental to GCRL, we conjecture that dense reward functions that satisfy this condition can only improve, never worsen, sample complexity. We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits
to sample complexity.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=BOq66KrngZ
Changes Since Last Submission: Changes:
* Deanonymized.
* Added Fig. 3 with complete results using PQE critic architecture.
* Minor changes in text to integrate the results from remaining experiments with PQE, as well as to improve flow and clarity.
* Included link to public codebase on GitHub in Sec. 5.
* Added Acknowledgments section (Sec. 7).
Code: https://github.com/khadimon/GCRL-Dense-Rewards
Assigned Action Editor: ~Goran_Radanovic1
Submission Number: 4840
Loading