Keywords: Reinforcement Learning; Option Hedging; Finance
Abstract: Options are widely used financial derivatives for risk management and corporate operations. Option hedging aims to mitigate investment risks from asset price fluctuations by buying and selling other financial products. Traditional hedging strategies based on the Black-Scholes model face practical limitations due to the assumptions of constant volatility and the neglect of transaction costs. Recently, reinforcement learning(RL) has gained attention in the study of option hedging strategies, but several challenges remain: current methods rely on real-time market data (e.g., underlying asset prices, holdings, remaining option term) to determine optimal positions, underutilizing the potential value of historical data; existing approaches focus on the expected hedging cost, overlooking the comprehensive distribution of costs; In the aspect of training data generation, commonly used single simulation methods perform well under specific conditions but struggle to ensure the robustness of the model across diverse datasets. To address these issues, we propose a novel distributional RL option hedging method that incorporates historical information. Historical states are included in the state variables, with a gated recurrent unit (GRU) network layer extracting historical information. This is then combined with current information from fully connected layers to inform subsequent network layers, ensuring the agent considers both current and historical market information when learning hedging strategies. The output of the value network is set as a series of quantiles, with the Quantile Huber Loss function fitting their distribution to evaluate strategies based on distribution rather than expected value. To diversify data sources, we use a combination of the Black-Scholes model, the Binomial model, and the Heston model to simulate a large volume of option data. Experimental results show that our method significantly reduces hedging costs and demonstrates strong adaptability and practicality under various market conditions.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10515
Loading