Norm Compliance in Reinforcement Learning Agents via Restraining Bolts
Abstract: We modify the restraining bolt technique, originally designed for safe reinforcement learning, to regulate agent behavior in alignment with social, ethical, and legal norms. Rather than maximizing rewards for norm compliance, our approach minimizes penalties for norm violations. We demonstrate in case studies the effectiveness of our approach in capturing benchmark challenges in normative reasoning like contrary-to-duty obligations, exceptions, and temporal obligations.
Loading