A Comparative Study of Deep Reinforcement Learning Algorithms for Dynamic Option Hedging

Andrei Neagu; Frédéric Godin; Leila Kosseim

A Comparative Study of Deep Reinforcement Learning Algorithms for Dynamic Option Hedging

Andrei Neagu, Frédéric Godin, Leila Kosseim

Published: 02 Mar 2026, Last Modified: 07 Mar 2026ICLR 2026 Workshop AIMSEveryoneRevisionsCC BY 4.0

Keywords: Deep Hedging, Deep Reinforcement Learning, Reinforcement Learning, Dynamic Hedging, Option Hedging, Computational Finance, Sequential Decision-Making

Abstract: Dynamic hedging involves periodically trading financial assets to offset the risk associated with a correlated liability. Deep Reinforcement Learning (DRL) algorithms have been used to find optimal solutions to dynamic hedging problems by framing them as sequential decision-making problems. However, most previous work assesses the performance of only one or two DRL algorithms, making an objective comparison across algorithms difficult. In this paper, we compare the performance of eight DRL algorithms in the context of dynamic hedging; Monte Carlo Policy Gradient (MCPG), Proximal Policy Optimization (PPO), four variants of Deep Q-Learning (DQL) and two variants of Deep Deterministic Policy Gradient (DDPG). Two of these variants represent a novel application to the task of dynamic hedging. In our experiments, we use the Black-Scholes delta hedge as a baseline and simulate the dataset using a GJR-GARCH(1,1) model. Results show that MCPG obtains the best performance in terms of the root semi-quadratic penalty. Moreover, MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget, possibly due to the sparsity of rewards in our environment.

Track: Long Paper

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 60

Loading