Bridging Distributional and Risk-Sensitive Reinforcement Learning: Balancing Statistical, Computational, and Risk Considerations
Abstract: High-stakes applications like finance and healthcare require risk-sensitive methods that maximize a risk measure of the return distribution. Existing risk-sensitive reinforcement learning (RSRL) faces computational and statistical challenges due to non-linearity of risk measures. This paper proposes computationally efficient distributional reinforcement learning (DRL) algorithms with regret guarantees, addressing these challenges. In particular, we introduce two variants of the principled DRL algorithm, \texttt{RODI} \cite{liang2022bridging}, that use a novel distribution representation and projection method, maintaining regret bound while keeping computational efficiency. Our algorithms, \texttt{RODI-Rep}, demonstrate improved regret performance compared to traditional non-distributional RL methods through theoretical analysis and empirical validation.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: Yes
Submission Number: 78