ARS: Adaptive Reward Scaling for Multi-Task Reinforcement Learning

Myungsik Cho; Jongeui Park; Jeonghye Kim; Youngchul Sung

ARS: Adaptive Reward Scaling for Multi-Task Reinforcement Learning

Myungsik Cho, Jongeui Park, Jeonghye Kim, Youngchul Sung

Published: 01 May 2025, Last Modified: 14 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

TL;DR: We propose Adaptive Reward Scaling (ARS) for multi-task RL, which balances rewards across tasks and includes periodic network resets to enhance stability and efficiency, outperforming baselines on Meta-World and excelling on complex tasks.

Abstract: Multi-task reinforcement learning (RL) encounters significant challenges due to varying task complexities and their reward distributions from the environment. To address these issues, in this paper, we propose Adaptive Reward Scaling (ARS), a novel framework that dynamically adjusts reward magnitudes and leverages a periodic network reset mechanism. ARS introduces a history-based reward scaling strategy that ensures balanced reward distributions across tasks, enabling stable and efficient training. The reset mechanism complements this approach by mitigating overfitting and ensuring robust convergence. Empirical evaluations on the Meta-World benchmark demonstrate that ARS significantly outperforms baseline methods, achieving superior performance on challenging tasks while maintaining overall learning efficiency. These results validate ARS's effectiveness in tackling diverse multi-task RL problems, paving the way for scalable solutions in complex real-world applications.

Lay Summary: How can we train a single agent to tackle many tasks that each give very different rewards? We introduce Adaptive Reward Scaling (ARS), a simple method that watches past rewards and automatically rescales them so no task dominates learning. At the same time, ARS periodically “restarts” the agent’s network weights to prevent overfitting to easy tasks. In experiments on the Meta-World robot benchmark, this combination lets a single policy learn faster, solve harder tasks, and stay stable, showing that balancing reward scales and occasional resets can make multi-task reinforcement learning both more efficient and more reliable.

Primary Area: Reinforcement Learning->Deep RL

Keywords: reinforcement learning, multi-task reinforcement learning, reward scaling

Flagged For Ethics Review: true

Submission Number: 10128

Loading