Abstract: As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s, a, g)$ has a quasipseudometric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses assume a sparse reward setting—a known aggravating factor to sample complexity. We show that the key property underpinning a quasipseudometric, viz., the triangle inequality, is preserved under a dense reward setting as well, specifically identifying the key condition necessary for triangle inequality. Contrary to earlier findings where dense rewards were shown to be detrimental to GCRL, we conjecture that dense reward functions that satisfy this condition can only improve, never worsen, sample complexity. We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits
to sample complexity.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=BOq66KrngZ
Changes Since Last Submission: Changes (in red):
* Ensured that specificity of dense rewards is clear;
* Revised the bullet point list in Sec. 1 to highlight the unique contributions compared to existing results;
* Added a roadmap for Sec. 3;
* Added a discussion on the estimation of $\eta$ (Sec. 3.1.5);
* Added (partial) results from experiments with PQE architecture for critic (Fig. 3 in Sec. 7; Appendix);
* Added sensitivity results for several values of $\eta$ (Fig. 4 in Sec. 7; Appendix);
* Other changes as advised.
Assigned Action Editor: ~Goran_Radanovic1
Submission Number: 4840
Loading