Keywords: safe reinforcement learning, runtime behavior steering, state-wise constraint satisfaction, policy shaping, policy customization, norm compliance, trust region, ai alignment
TL;DR: We present State-wise Constrained Policy Shaping (SCPS), a principled, learning-free method for runtime policy augmentation that enforces state-wise safety constraints and steers agent behavior to comply with behavioral norms.
Abstract: While reinforcement learning can learn effective policies for maximizing reward, it remains difficult to encode complex behavioral preferences through reward engineering alone, especially for safety-critical applications. We present State-wise Constrained Policy Shaping (SCPS), a general-purpose algorithm for steering agent behavior at runtime that guarantees state-wise safety constraint satisfaction when feasible and encourages compliance with behavioral norms. SCPS minimizes the expected norm violation cost within a trust region around the original policy, balancing task performance with norm compliance at runtime. Behavioral norms are specified post-training as soft constraints, enabling the agent to adapt to evolving requirements without any additional learning. We evaluate SCPS in the HighwayEnv autonomous driving environment using a Deep Q-Network, where it reduces the collision rate by 97% and the norm violation cost rate by 89% in-distribution relative to the base policy. SCPS also generalizes robustly under zero-shot evaluation, achieving significant improvements in safety and norm compliance.
Submission Number: 23
Loading