Heuristic-Guided Distributional Reinforcement Learning

Published: 03 Jun 2026, Last Modified: 03 Jun 2026ALA 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Distributional Reinforcement Learning, Heuristics for Reinforcement Learning, Learning Agents
TL;DR: Sample-efficient heuristic integration in distributional reinforcement learning
Abstract: Distributional Reinforcement Learning (DiRL) is a framework which accounts for the full probability distribution of the action values, resulting in higher training stability and robustness. In standard RL, the agent often requires the aid of domain-specific heuristics to handle long planning horizons and sparse rewards. However, a study on the impact and the best algorithmic integration of heuristics in DiRL is still missing. This paper addresses this issue, by proposing two different methodologies to handle heuristics at the distributional level in DiRL: modifying the parameters of the distribution (shift-DiRL) or altering the full probability mass function (product-DiRL). Specifically, we found our research on the most popular C51 algorithm for discrete-action domains, and then we seek to extend our findings to DiRL in continuous action spaces. An empirical analysis over two discrete domains and one continuous shows the advantage of our methodologies with respect to classical reward machines, even in the case of possibly incorrect heuristics. Moreover, the superiority of product-DiRL altering the shape of the value probability highlights the promising role of the distributional representation for heuristic-guided RL.
Journal Edition Interest: Yes
Supplementary Material: pdf
Submission Number: 2
Loading