Keywords: Reward Shape
TL;DR: Capacity-aware reward shaping that normalizes replay rewards and applies an adaptively tuned rational activation to improve stability.
Abstract: Fixed environment rewards can lead to miscalibrated gradients, instability, and inefficient learning when signals are poorly scaled relative to the agent's updates. We introduce \textbf{Rational Reward Shaping (RRS)}, a reward transformation that converts raw rewards into normalized signals aligned with the agent's experience. RRS combines experience-normalized scaling with a monotone rational activation to reshape sensitivity and curvature while preserving reward order. It adapts automatically to changing reward regimes and integrates seamlessly into standard actor–critic updates--simply replacing the immediate reward in the target--requiring minimal code changes and no task-specific reward engineering. Across DDPG, TD3, and SAC on six MuJoCo benchmarks, RRS consistently improves average returns in both noiseless and perturbed-reward settings, with larger gains under noise, while incurring only 6\% average wall-clock overhead. RRS provides a general, plug-and-play method to produce better-calibrated reward signals, strengthening learning without modifying environment design. Source code is available at: \url{https://github.com/anonymouszxcv16/RRS}
Primary Area: reinforcement learning
Submission Number: 2835
Loading