Abstract: Reinforcement learning (RL) is widely used to tackle optimal control problems, with optimality conditions as the principle of algorithm design. Two key optimality conditions are the Pontryagin maximum principle (PMP) and the Hamilton–Jacobi–Bellman (HJB) equation. The relationship between these conditions is vital for developing effective policy learning algorithms. Existing studies mainly focus on their relationship in the open-loop optimality, but this property cannot be directly extended to nonoptimal or closed-loop cases due to the absence of extreme condition and the existence of nonzero partial derivative term. This article unifies the relationship between PMP and HJB equations in all cases by considering optimal control problems as nonholonomic Lagrange systems, and proves the intrinsic equivalence between value function and costate variable from the perspective of Hamilton dynamics. We redefine costate variable as Legendre transformation of state derivative in nonholonomic Lagrange systems, where the Weierstrass condition is selected as a constraint for optimal cases while the fixed policy condition is chosen for nonoptimal cases. By utilizing the anti-symmetric property of canonical equations, we identify conservation properties in optimal control where symplectic form remains invariant across all cases. Additionally, we prove that the costate variable has the identical differential equation and boundary conditions as the partial derivative of value function with respect to state in both optimal and nonoptimal, open-loop, and closed-loop cases. Numerical experiments are conducted to verify our theoretical results. This discovery can establish more readily available conservation conditions, thereby providing a high-level view angle on algorithm designs.
External IDs:doi:10.1109/tai.2025.3557399
Loading