- Keywords: Deep reinforcement learning, risk averse Markov decision processes, expectile risk measures, derivative pricing
- Abstract: Recently equal risk pricing, a framework for fair derivative pricing, was extended to consider coherent risk measures. However, all current implementations either employ a static risk measure or are based on traditional dynamic programming solution schemes that are impracticable in realistic settings: when the number of underlying assets is large or only historical trajectories are available. This paper extends for the first time the deep deterministic policy gradient algorithm to the problem of solving a risk averse Markov decision process that models risk using a time consistent dynamic expectile risk measure. Our numerical experiments, which involve both a simple vanilla option and a more exotic basket option, confirm that the new ACRL algorithm can produce high quality hedging strategies that produce accurate prices in simple settings, and outperform the strategies produced using static risk measures when the risk is evaluated at later points of time.
- One-sentence Summary: This paper extends for the first time a model-free off-policy actor-critic algorithm to the problem of solving a risk averse Markov decision process and applies it to calculate the equal risk prices of financial derivatives in an incomplete market.
- Supplementary Material: zip