Dual-Objective Reinforcement Learning through novel Hamilton-Jacobi Bellman Formulations

Published: 22 Nov 2025, Last Modified: 22 Nov 2025SAFE-ROL PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: Safe Reinforcement Learning, Multi-Objective, Hamilton-Jacobi
TL;DR: Dual-Objective Reinforcement Learning through novel Hamilton-Jacobi Bellman Formulations
Abstract: Hard constraints in reinforcement learning (RL), whether imposed via the reward function or the model architecture, often degrade policy performance. Lagrangian methods offer a way to blend objectives with constraints, but often require intricate reward engineering and parameter tuning. In this work, we extend recent advances that connect Hamilton-Jacobi (HJ) equations with RL to propose two novel value functions for dual-objective satisfaction. Namely, we address: (1) the Reach-Always-Avoid problem – of achieving distinct reward and penalty thresholds – and (2) the Reach-Reach problem – of achieving thresholds of two distinct rewards. We derive explicit, tractable Bellman forms in this context by decomposing our problem. The RAA and RR problems are fundamentally different from standard sum-of-rewards problems and temporal logic problems, providing a new perspective on constrained decision-making. We leverage our analysis to propose a variation of Proximal Policy Optimization (DO-HJ-PPO), which solves these problems. Across a range of tasks for safe-arrival and multi-target achievement, we demonstrate that DO-HJ-PPO out-competes many baselines.
Supplementary Zip: zip
Submission Number: 27
Loading