Keywords: Sequential decision making; Constrained reinforcement learning; Stochastic programming; Duality; Energy systems.
TL;DR: Efficiently Training Deep-Learning Parametric Policies for Sequential Decision Problems using Lagrangian Duality.
Abstract: Sequential Decision Making under Uncertainty (SDMU) appears across energy, finance, and supply chains. Stochastic Dual Dynamic Programming (SDDP) is a powerful solution approach to these problems, but assumes convexity and stage-wise independence; Two-Stage Linear Decision Rules (TS-LDRs) relax independence and yield fast policies but are limited in non-convex environments. This paper introduces Two-Stage General Decision Rules (TS-GDR) and an instantiation, Two-Stage Deep Decision Rules (TS-DDR), which train nonlinear, time-invariant policies by combining deterministic optimization in the forward pass with duality-based closed-form gradients in the backward pass. On the Long-Term Hydrothermal Dispatch (LTHD) problem for the Bolivian grid, TS-DDR improves solution quality and reduces training/inference time by orders of magnitude compared to SDDP, Reinforcement Learning (RL), and TS-LDR across linear (DCLL), conic (SOC), and non-convex (AC) implementations; it also outperforms a model-predictive control baseline on the stochastic Goddard rocket control problem.
Submission Number: 67
Loading