Sliced Distributional Reinforcement Learning

Sliced Distributional Reinforcement Learning

ICLR 2026 Conference Submission13872 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Distributional Reinforcement Learning, Multivariate Reinforcement learning, Sliced Probability Divergences, Bellman Operator Contraction

TL;DR: We introduce Sliced Distributional RL, lifting one-dimensional divergences to multivariate settings with contraction guarantees for anisotropic Bellman updates, and demonstrate strong results on multivariate and multi-horizon tasks.

Abstract: Distributional reinforcement learning (DRL) models full return distributions rather than expectations, but extending to multivariate settings can be challenging. Univariate tractability is lost, and multivariate approaches are either computationally expensive or lack contraction guarantees. We propose Sliced Distributional Reinforcement Learning (SDRL) which lifts the tractable one-dimensional divergences to the multivariate case through random projections and aggregation. We prove Bellman contraction under uniform slicing for shared scalar discounts and under max slicing for general anisotropic matrix-discount updates, providing the first contraction result in this setting. SDRL accommodates a broad class of base divergences, instantiated here with Wasserstein, Cramér and Maximum Mean Discrepancy (MMD). In experiments, SDRL achieves competitive results on multivariate control tasks in MO-Gymnasium. As an application of matrix discounting, we extend multi-horizon RL with hyperbolic scalarization to the distributional regime. Taken together, these findings position slicing as a principled and scalable foundation for multivariate distributional reinforcement learning.

Primary Area: reinforcement learning

Submission Number: 13872

Loading