Uncertainty-aware Reward Design Process

Yang yang; Xiaolu Zhou; Bosong Ding; Miao Xin

Uncertainty-aware Reward Design Process

Yang yang, Xiaolu Zhou, Bosong Ding, Miao Xin

Published: 21 Nov 2025, Last Modified: 21 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate the design of reward functions. However, LLMs’ insufficient numerical optimization capabilities often result in suboptimal reward hyperparameter tuning, while non-selective validation of candidate reward functions leads to substantial computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation. URDP quantifies candidate reward function uncertainty based on the self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel ones. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to improve the hyperparameter configuration. Finally, we construct a bi-level optimization framework by decoupling the reward component optimization and the hyperparameter tuning. URDP promotes the collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments: IsaacGym, Bidexterous Manipulation, and ManiSkill2. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches. We open-source all code at https://github.com/Yy12136/URDP.

Submission Length: Regular submission (no more than 12 pages of main content)

Supplementary Material: zip

Assigned Action Editor: ~Sebastian_Tschiatschek1

Submission Number: 5344

Loading