Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations

Published: 16 Sept 2025, Last Modified: 16 Sept 2025CoRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe RL, State-Augmentation, Hamilton-Jacobi, Reinforcement Learning
TL;DR: Novel bellman equations based on maximum value propagation for multi-objective tasks which require memory (state-augmentation)
Abstract: Hard constraints in reinforcement learning (RL), whether imposed via the reward function or the model architecture, often degrade policy performance. Lagrangian methods offer a way to blend objectives with constraints, but often require intricate reward engineering and parameter tuning. In this work, we extend recent advances that connect Hamilton-Jacobi (HJ) equations with RL to propose two novel value functions for dual-objective satisfaction. Namely, we address: (1) the \textbf{Reach-Always-Avoid} problem – of achieving distinct reward and penalty thresholds – and (2) the \textbf{Reach-Reach} problem – of achieving thresholds of two distinct rewards. We derive explicit, tractable Bellman forms in this context by decomposing our problem. The RAA and RR problems are fundamentally different from standard sum-of-rewards problems and temporal logic problems, providing a new perspective on constrained decision-making. We leverage our analysis to propose a variation of Proximal Policy Optimization (\textbf{DO-HJ-PPO}), which solves these problems. Across a range of tasks for safe-arrival and multi-target achievement, we demonstrate that DO-HJ-PPO out-competes many baselines.
Supplementary Material: zip
Submission Number: 8
Loading