LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

ICLR 2026 Conference Submission14901 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hierarchical Reinforcement Learning, Large Language Models

TL;DR: We present LGR2, a hierarchical reinforcement learning based approach that employs language-guided reward generation to address HRL non-stationarity in robotic control tasks.

Abstract: Large language models (LLMs) have shown remarkable abilities in logical reasoning, in-context learning, and code generation. However, translating natural language instructions into effective robotic control policies remains challenging, particularly for long-horizon tasks with sparse rewards. While Hierarchical Reinforcement Learning (HRL) provides a natural framework for such tasks, it suffers from reward-level non-stationarity as the evolving lower-level policy destabilizes higher-level learning. We propose Language Guided Reward Relabeling (LGR2), a novel HRL framework that leverages LLMs to generate language-guided reward functions for higher-level policy training. By using LLM-derived reward parameters that remain consistent across training iterations, LGR2 addresses reward-level non-stationarity in off-policy HRL while maintaining semantic alignment with natural language instructions. To enhance sample efficiency in sparse environments, we integrate goal-conditioned hindsight experience relabeling with the language-guided rewards. Extensive experiments across simulated robotic navigation and manipulation tasks demonstrate that LGR2 achieves substantial improvements over hierarchical and flat baselines, with success rates reaching 60-80% on challenging tasks compared to 10-30% for competing methods. Our initial sim-to-real experiments on pick-and-place and bin tasks show promising transfer capabilities, achieving over 50% success rates while outperforming the baselines.

Primary Area: reinforcement learning

Submission Number: 14901

Loading