Adapting Rewards to the Agent Using Rational Activation Functions

Nitsan Soffair; Guy Gabay; Rami Puzis; Gilad Katz

Adapting Rewards to the Agent Using Rational Activation Functions

Nitsan Soffair, Guy Gabay, Rami Puzis, Gilad Katz

07 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reward Shape

TL;DR: Capacity-aware reward shaping that normalizes replay rewards and applies an adaptively tuned rational activation to improve stability.

Abstract: Fixed environment rewards can lead to miscalibrated gradients, instability, and inefficient learning when signals are poorly scaled relative to the agent's updates. We introduce \textbf{Rational Reward Shaping (RRS)}, a reward transformation that converts raw rewards into normalized signals aligned with the agent's experience. RRS combines experience-normalized scaling with a monotone rational activation to reshape sensitivity and curvature while preserving reward order. It adapts automatically to changing reward regimes and integrates seamlessly into standard actor–critic updates--simply replacing the immediate reward in the target--requiring minimal code changes and no task-specific reward engineering. Across DDPG, TD3, and SAC on six MuJoCo benchmarks, RRS consistently improves average returns in both noiseless and perturbed-reward settings, with larger gains under noise, while incurring only 6\% average wall-clock overhead. RRS provides a general, plug-and-play method to produce better-calibrated reward signals, strengthening learning without modifying environment design. Source code is available at: \url{https://github.com/anonymouszxcv16/RRS}

Primary Area: reinforcement learning

Submission Number: 2835

Loading