A Unified Framework for Average Reward Criterion and Risk

Willy Arthur Silva Reis, Karina Valdivia Delgado, Valdinei Freire

Published: 01 Jan 2024, Last Modified: 09 Oct 2025BRACIS (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The average reward criterion is used to solve infinite-horizon MDPs. This risk-neutral criterion depends on the stochastic process in the limit and can use (i) the accumulated reward at infinity, which considers sequences of states of size \(h=\infty \), or (ii) the steady state distribution of the MDP (i.e., the probability that the system is in each state in the long term), which considers sequences of states of size \(h=1\). In many situations, it is desirable to consider risk during the process at each stage, which can be achieved with the average reward criterion using a utility function or a risk measure such as VaR and CVaR. The objective of this work is to propose a mathematical framework that allows a unified treatment of the existing literature using average reward and risk, including works that use exponential utility functions and CVaR, as well as to include interpretations with \(1 \le h \le \infty \) not present in the literature. These new interpretations allow differentiating policies that may not be distinguished from existing criteria. A numerical example shows the behaviors of the criteria considering this new framework.