State-Wise Constrained Policy Shaping: Runtime Behavior Steering for Safe Reinforcement Learning

AAAI 2026 Workshop AIGOV Submission23 Authors

20 Oct 2025 (modified: 25 Nov 2025)AAAI 2026 Workshop AIGOV SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: safe reinforcement learning, runtime behavior steering, state-wise constraint satisfaction, policy shaping, policy customization, norm compliance, trust region, ai alignment
TL;DR: We present State-wise Constrained Policy Shaping (SCPS), a principled, learning-free method for runtime policy augmentation that enforces state-wise safety constraints and steers agent behavior to comply with behavioral norms.
Abstract: While reinforcement learning can learn effective policies for maximizing reward, it remains difficult to encode complex behavioral preferences through reward engineering alone, especially for safety-critical applications. We present State-wise Constrained Policy Shaping (SCPS), a general-purpose algorithm for steering agent behavior at runtime that guarantees state-wise safety constraint satisfaction when feasible and encourages compliance with behavioral norms. SCPS minimizes the expected norm violation cost within a trust region around the original policy, balancing task performance with norm compliance at runtime. Behavioral norms are specified post-training as soft constraints, enabling the agent to adapt to evolving requirements without any additional learning. We evaluate SCPS in the HighwayEnv autonomous driving environment using a Deep Q-Network, where it reduces the collision rate by 97% and the norm violation cost rate by 89% in-distribution relative to the base policy. SCPS also generalizes robustly under zero-shot evaluation, achieving significant improvements in safety and norm compliance.
Submission Number: 23
Loading