Keywords: reward discounting, multi-agent RL, hyperbolic discounting
TL;DR: We introduce hyperbolic discounting in MARL
Abstract: Decisions often require balancing immediate gratification against long-term benefits. In Reinforcement Learning (RL), this balancing act is influenced by temporal discounting, which quantifies the devaluation of future rewards. Prior research indicates that human decision-making aligns more closely with hyperbolic discounting than the conventional exponential discounting used in RL. As artificial agents become more advanced and pervasive, particularly in multi-agent settings alongside humans, the need for appropriate discounting models becomes critical. Although hyperbolic discounting has been proposed for single-agent learning, its potential in multi-agent reinforcement learning (MARL) remains unexplored. We introduce and formulate hyperbolic discounting in MARL, establishing theoretical and practical foundations across various frameworks, including independent learning, centralized policy gradient, and value decomposition methods. We evaluate hyperbolic discounting on diverse cooperative tasks, comparing it to the exponential discounting baseline. Our results show that hyperbolic discounting achieves higher returns in 60% of scenarios and performs on par with exponential discounting in 95% of tasks, with significant improvements in sparse reward and coordination-intensive environments. This work opens new avenues for robust decision-making processes in the development of advanced multi-agent systems.
Submission Number: 32
Loading