Climb with SHERPA: Heuristic-Guided Reinforcement Learning via Segmented Experience Relay

Published: 08 May 2026, Last Modified: 08 May 2026ICRA 2026 Workshop RL4IL PosterEveryoneRevisionsCC BY 4.0
Keywords: Reinforcement Learning, Transfer Learning, Deep Learning in Grasping and Manipulation
Abstract: In sparse-reward, long-horizon domains, reinforcement learning (RL) often suffers from slow convergence and instability, complicating robotic manipulation. Previous heuristic-guided approaches have relied on step-level actions and imitation loss, but struggle to maintain temporal coherence or solve multi-stage tasks. We present SHERPA (Segmented Heuristic Experience Relay for Policy Assistance), which alternates control between a heuristic policy and an RL policy in contiguous segments, preserving coherent sub-trajectories and yielding more stable learning. Unlike purely imitative objectives, SHERPA leverages expert-like heuristic guidance while optimizing its policy through RL, thereby enabling performance that ultimately surpasses the heuristic itself. This behavior can be further strengthened by incorporating phase-specific rewards naturally derived from heuristic rules. Across ten tasks in the Fetch and Panda suites, including four long-horizon benchmarks, SHERPA consistently outperforms RL, IL, and heuristic-guided baselines and demonstrates robustness even under degraded heuristics. Real-world experiments on a UR5 robot further confirm SHERPA’s practical scalability.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 14
Loading