Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Masahiro Kato; Kei Nakagawa; Kenshi Abe; Tetsuro Morimura

Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: Reinforcement learning, Mean-variance tradeoff

Abstract: In reinforcement learning (RL) for sequential decision making under uncertainty, existing methods proposed for considering mean-variance (MV) trade-off suffer from computational difficulties in computation of the gradient of the variance term. In this paper, we aim to obtain MV-efficient policies that achieve Pareto efficiency regarding MV trade-off. To achieve this purpose, we train an agent to maximize the expected quadratic utility function, in which the maximizer corresponds to the Pareto efficient policy. Our approach does not suffer from the computational difficulties because it does not include gradient estimation of the variance. In experiments, we confirm the effectiveness of our proposed methods.

0 Replies

Loading