Pontryagin-Guided Direct Policy Optimization for Continuous-Time Portfolio Problem

Jeonggyu Huh, Seungwon Jeong, Jaegi Jeon

Published: 29 Sept 2025, Last Modified: 06 May 2026Journal of Industrial and Management OptimizationEveryoneCC BY 4.0

Abstract: We present Pontryagin-Guided Direct Policy Optimization (PG-DPO), a framework for solving continuous-time portfolio optimization problems involving both consumption and investment decisions. Integrating Pontryagin's Maximum Principle (PMP) within a neural network pipeline, PG-DPO bypasses traditional value function approximation and directly optimizes policy parameters using adjoint processes associated with the current policy, computed via automatic differentiation. An optional alignment penalty, explicitly derived from PMP conditions, significantly accelerates convergence and improves policy stability during training. Numerical experiments validate the framework's efficacy: PG-DPO accurately recovers the closed-form solution for the classical Merton problem and, crucially, demonstrates its capability to handle more realistic, state-dependent dynamics involving stochastic factors, effectively capturing intertemporal hedging demands. These results highlight that the PMP-guided deep learning approach offers an effective and potentially efficient pathway for direct policy optimization in complex continuous-time stochastic control settings within finance.